## Identifying Frauds Using Correlation Coefficient

Correlation is a relationship between two things (variables) that tend to vary or move together. There is usually some logical connection between the two things/variables.

How much the two variables are connected can be determined by calculating the correlation coefficient, which is denoted as ‘r’.

The correlation coefficient has a value between 1 to -1, where:

• 1 is perfect positive correlation
• 0 is no correlation
• -1 is perfect negative correlation, defined as a relationship between two variables in which one variable increases as the other decreases, and vice versa.

In general, the correlation coefficient, whether negative or positive, can be interpreted as:

• 0.0 to 0.2 – No correlation
• 0.2 to 0.4 – Weak correlation
• 0.4 to 0.6 – Moderate correlation
• 0.6 to 0.8 – Strong correlation
• 0.8 to 1.0 – Very strong correlation
##### How correlation can be used to detect frauds?

Since correlation coefficient measures the relationship between two variables, that means the two variables follow a similar pattern over a period of time.

Let’s consider the example of a company with many franchisee stores. A franchiser makes money by receiving the royalty payments from the franchisee, which is a fixed percentage of sales made by franchisee.

On a monthly basis the franchisee store shares the sales data of the month and the royalty payable to franchiser. The sales reports are processed by accounts receivable department and follow up is done for missing values and obvious errors.  Reporting errors by franchisees could be either intentional or unintentional. These errors are usually in the nature of understated sales and result in a revenue loss for the franchiser. Carrying out a full revenue audit on location for every franchisee is possible but a costly and a time-consuming affair.

So, how do we identify which franchisee units we should audit or which units are reporting a lower sales number.

Normally the sales of the franchisee store would move in tandem with the sales of the stores owned by the franchiser. Meaning over a period of time both the units would have a similar pattern. So, we can use the correlation coefficient to determine the how strongly sales of both stores are related and can determine the stores for full revenue audit.

There could be cases where the franchisee stores sales might not follow the similar pattern that of franchiser stores. The reason for such an anomaly could be other than fraud. For example, a restaurant located near universities would report a lower sale during the term breaks and vacations. Similarly, a restaurant located near a place where events like sports or musical concerts takes place would report a higher sale.

The use of coefficient correlation is not just limited to franchisee sales, it can used in all scenarios where a valid benchmark is available against which we can measure the numbers of our variable.

##### Calculating Correlation Coefficient in Excel

Calculating correlation coefficient is Excel is extremely simple. To calculate you need to use the following formula;

=CORREL(array 1,array 2)

In the above image the stores F1, F2 and F3 are having a very strong correlation when compared with average sales of company owned store. Whereas stores F5 and F6 have a moderate correlation and store F4 no correlation. Do remember the anomaly in pattern could be due to reasons other than fraud.

Correlation provides an informative signal that the data for a speciﬁc variable differs from the average pattern or some other benchmark. The use of correlation combined with other techniques, could be useful indicators of errors or fraud.

It is important to note that correlation doesn’t imply causation. Causation means that one event is the result of the occurrence of the other event, i.e. there is a casual relationship between the two events. This is also called as cause and effect.

Theoretically, the difference between the two types of relationships are easy to identify – an action or occurrence can cause another (e.g. alcoholism can increase risk of liver problems), or it can correlate with another (e.g. alcoholism is correlated with smoking, but it does not cause alcoholism). In practice, however, it remains difficult to clearly establish cause and effect, compared with establishing correlation.