Largest Subset & Growth Tests

The first analysis of any data set need not use sophisticated mathematics or statistics. The goal of these test is to find subsets that are highly inflated due to the error or fraud.

Largest Subset Test

The largest subsets test uses two fields, one with transaction or balance numbers (such as amount, inventory counts, vote counts, population counts) and another field to indicate the subset (e.g., vendor number, credit card number, or branch number). Subset is a group of records that have something in common.

The data can often be divided into several subset groupings. For example, accounts payable data could be grouped by vendor or by the type of purchase (purchase order, no. purchase order) or by time. There are often a few different ways that data can be divided into subsets. For inventory data the grouping could be by location. For airline ticket refunds or retail customer refunds the groupings could be the credit card that received the refund.

The largest subsets test has produced interesting and useful results. Experience has shown that fraudsters do not know when to stop. They keep up with the scheme to the point that they are likely to show up as a large subset.

Calculating largest subset in Excel

Calculating this test in excel is fairly easy and involves use of pivot tables. For example I’m using the vendor payments data to calculate largest subsets:

  1. Select the data and create a pivot table in a new sheet
  2. Take vendor no. in rows and invoice no. (to give the total count of invoices) and invoice amount
  3. Sort the invoice amount column in descending order, this will give us the top vendors in the vendor payment data.calculating largest subset in excel

Largest Growth Test

This test compares the growth in subsets from one period to another. This test identifies cases where a subset had  abnormal growth over some period of time. This growth spurt could be due to fraud, error, or simply a change in circumstances. In forensics work, we are not only interested in growth. A large decrease in amount might also signal fraud. For example, a hotel might report substantially lower sales for tax purposes or a franchisee might report substantially lower sales to the franchise holder.

Running largest growth test in Excel

To run this test you need data for two different period. For example I’m comparing vendor payments for Jul-Dec to Jan-Jun. comparison period can be quarters, years, etc. based on the requirement.

  1. To bifurcate the payments in the first six months and next six months, extract month using text formula from invoice date using “Month” formula
  2. Now bifurcate the payments in first half and second half of the year using “IF” formula. In cell G2 use the following formula; =IF(E2>6,”H2″,”H1″), this would give us value H1 for first six months and H2 for next six months
  3. Now create a pivot table in the new worksheet, take vendor no. in rows, period in columns and invoice amount in valuescalculating largest growth test in excel
  4. Value paste the pivot table created and calculate the growth percentage. In cell D4 use the following formula; =IF(B4>0,(C4-B4)/B4*100,0), this will give the percent change in vendor payments in second half of the year as compared to the first half.
  5. Sort the percentage change values in largest to smallest, this will give us the vendors with highest growth ratecalculating largest growth test in excel
  6. Repeat the step above to get growth rate in absolute terms

Findings from the largest subsets test have included fraudulent payments, fraudulent customer refunds, etc. The findings have also included processing inefficiencies.

The largest growth test identifies subsets that have had very high growth over two or more time periods. The growth can be measured as a percentage or in absolute terms.

Identifying relevant subsets requires some creativity. The goal in forensic analytics is to think of subsets such that abnormal duplications for a subset could signal errors, fraud, or processing inefficiencies.

You can download the Excel file I used for the example here: largest subset & growth test.


One thought on “Largest Subset & Growth Tests

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s