## Tukey test

The Tukey test is a way of comparing group means after an ANOVA, which has shown that there is a significant difference between any of the means.

You need to check the following assumptions before proceeding with this kind of Tukey test:

1. Sample sizes (n) are equal

The goal of the Tukey test is to:

Compare the group means by testing a set of null-hypotheses that the difference between two means is zero. This is accomplished by calculating the T-statistic, which here is used as a critical value to which all mean differences are compared:

The T-statistic is calculated as:

where $q$ is a value found in a table of the Tukey distribution for $a$ groups with degrees of freedom $v$  ($n-1$), $MS_W$ is the within variance or Means Square Within and $n$ is the sample size within each group.

Next, compare the T-statistic to all possible mean differences.

Mean differences with four groups:

If any difference is larger than the T-statistic, there is a significant difference between these two means, and the associated null-hypothesis should be rejected.

The Tukey test takes into account that several comparisons are made, which otherwise would have increased the risk of type 1 error. That means that we would have wrongly rejected the null-hypothesis in 5 times of 100.

Example

We used the same example as for the ANOVA. A company wants to find out if there is a difference in total sales between four geographical areas. There are 12 shops in each area, thus giving a total of 12 total sales per year (million dollars) for each area (Area 1-Area 4).

The ANOVA found a significant difference between the means (F3,44 =87.42, P<0.05). Now we want to find out which means that differ. We therefore carry out a Tukey test.

1. Calculate the difference between the means:

2. Compute the T-statistic:

3. Compare the differences with the T-statistic:

The only differences that exceeds the T-statistic are $d_2$, $d_4$ and $d_5$. That is those means that are compared to the groups Area 3. The mean of Area 3 is larger than all others.

3. Interpret the result:

We are 95 % certain that the mean of Area 3 is larger than the means of the other Areas.

How to do it in R

```#Import the data