## Largest Subset & Growth Tests

The ﬁrst analysis of any data set need not use sophisticated mathematics or statistics. The goal of these test is to ﬁnd subsets that are highly inflated due to the error or fraud.

##### Largest Subset Test

The largest subsets test uses two ﬁelds, one with transaction or balance numbers (such as amount, inventory counts, vote counts, population counts) and another ﬁeld to indicate the subset (e.g., vendor number, credit card number, or branch number). Subset is a group of records that have something in common.

The data can often be divided into several subset groupings. For example, accounts payable data could be grouped by vendor or by the type of purchase (purchase order, no. purchase order) or by time. There are often a few different ways that data can be divided into subsets. For inventory data the grouping could be by location. For airline ticket refunds or retail customer refunds the groupings could be the credit card that received the refund.

The largest subsets test has produced interesting and useful results. Experience has shown that fraudsters do not know when to stop. They keep up with the scheme to the point that they are likely to show up as a large subset.

Calculating largest subset in Excel

Calculating this test in excel is fairly easy and involves use of pivot tables. For example I’m using the vendor payments data to calculate largest subsets:

1. Select the data and create a pivot table in a new sheet
2. Take vendor no. in rows and invoice no. (to give the total count of invoices) and invoice amount
3. Sort the invoice amount column in descending order, this will give us the top vendors in the vendor payment data.
##### Largest Growth Test

This test compares the growth in subsets from one period to another. This test identiﬁes cases where a subset had  abnormal growth over some period of time. This growth spurt could be due to fraud, error, or simply a change in circumstances. In forensics work, we are not only interested in growth. A large decrease in amount might also signal fraud. For example, a hotel might report substantially lower sales for tax purposes or a franchisee might report substantially lower sales to the franchise holder.

Running largest growth test in Excel

To run this test you need data for two different period. For example I’m comparing vendor payments for Jul-Dec to Jan-Jun. comparison period can be quarters, years, etc. based on the requirement.

1. To bifurcate the payments in the first six months and next six months, extract month using text formula from invoice date using “Month” formula
2. Now bifurcate the payments in first half and second half of the year using “IF” formula. In cell G2 use the following formula; =IF(E2>6,”H2″,”H1″), this would give us value H1 for first six months and H2 for next six months
3. Now create a pivot table in the new worksheet, take vendor no. in rows, period in columns and invoice amount in values
4. Value paste the pivot table created and calculate the growth percentage. In cell D4 use the following formula; =IF(B4>0,(C4-B4)/B4*100,0), this will give the percent change in vendor payments in second half of the year as compared to the first half.
5. Sort the percentage change values in largest to smallest, this will give us the vendors with highest growth rate
6. Repeat the step above to get growth rate in absolute terms

Findings from the largest subsets test have included fraudulent payments, fraudulent customer refunds, etc. The ﬁndings have also included processing inefﬁciencies.

The largest growth test identiﬁes subsets that have had very high growth over two or more time periods. The growth can be measured as a percentage or in absolute terms.

Identifying relevant subsets requires some creativity. The goal in forensic analytics is to think of subsets such that abnormal duplications for a subset could signal errors, fraud, or processing inefﬁciencies.

You can download the Excel file I used for the example here: largest subset & growth test.