## Tukey test

The Tukey test is a way of comparing group means after an ANOVA, which has shown that there is a significant difference between any of the means.

You need to check the following assumptions before proceeding with this kind of Tukey test:

- Sample sizes (n) are equal

The goal of the Tukey test is to:

Compare the group means by testing a set of null-hypotheses that the difference between two means is zero. This is accomplished by calculating the T-statistic, which here is used as a critical value to which all mean differences are compared:

The T-statistic is calculated as:

where is a value found in a table of the Tukey distribution for groups with degrees of freedom (), is the within variance or Means Square Within and is the sample size within each group.

Next, compare the T-statistic to all possible mean differences.

Mean differences with four groups:

If any difference is larger than the T-statistic, there is a significant difference between these two means, and the associated null-hypothesis should be rejected.

The Tukey test takes into account that several comparisons are made, which otherwise would have increased the risk of type 1 error. That means that we would have wrongly rejected the null-hypothesis in 5 times of 100.

**Example**

We used the same example as for the ANOVA. A company wants to find out if there is a difference in total sales between four geographical areas. There are 12 shops in each area, thus giving a total of 12 total sales per year (million dollars) for each area (Area 1-Area 4).

The ANOVA found a significant difference between the means (F_{3,44} =87.42, P<0.05). Now we want to find out which means that differ. We therefore carry out a Tukey test.

1. Calculate the difference between the means:

2. Compute the T-statistic:

3. Compare the differences with the T-statistic:

The only differences that exceeds the T-statistic are , and . That is those means that are compared to the groups Area 3. The mean of Area 3 is larger than all others.

3. Interpret the result:

We are 95 % certain that the mean of Area 3 is larger than the means of the other Areas.

*How to do it in R*

#Import the data data2<-read.csv("http://www.ilovestats.org/wp-content/uploads/2015/07/Example_data.csv",dec=",",sep=";") #Tukey test TukeyHSD(aov(Sales~Area,data=data2)) plot(TukeyHSD(aov(Sales~Area,data=data2)))