Tukey test

The Tukey test is a way of comparing group means after an ANOVA, which has shown that there is a significant difference between any of the means.

You need to check the following assumptions before proceeding with this kind of Tukey test:

  1. Sample sizes (n) are equal

The goal of the Tukey test is to:

Compare the group means by testing a set of null-hypotheses that the difference between two means is zero. This is accomplished by calculating the T-statistic, which here is used as a critical value to which all mean differences are compared:

The T-statistic is calculated as:

Tukey

where q is a value found in a table of the Tukey distribution for a groups with degrees of freedom v  (n-1), MS_W is the within variance or Means Square Within and n is the sample size within each group.

Next, compare the T-statistic to all possible mean differences.

Mean differences with four groups:

Tukey

If any difference is larger than the T-statistic, there is a significant difference between these two means, and the associated null-hypothesis should be rejected.

The Tukey test takes into account that several comparisons are made, which otherwise would have increased the risk of type 1 error. That means that we would have wrongly rejected the null-hypothesis in 5 times of 100.

Example

We used the same example as for the ANOVA. A company wants to find out if there is a difference in total sales between four geographical areas. There are 12 shops in each area, thus giving a total of 12 total sales per year (million dollars) for each area (Area 1-Area 4).

The ANOVA found a significant difference between the means (F3,44 =87.42, P<0.05). Now we want to find out which means that differ. We therefore carry out a Tukey test.

1. Calculate the difference between the means:

Tukey example

2. Compute the T-statistic:

Tukey example2

3. Compare the differences with the T-statistic:

The only differences that exceeds the T-statistic are d_2, d_4 and d_5. That is those means that are compared to the groups Area 3. The mean of Area 3 is larger than all others.

3. Interpret the result:

We are 95 % certain that the mean of Area 3 is larger than the means of the other Areas.

How to do it in R

#Import the data

 data2<-read.csv("http://www.ilovestats.org/wp-content/uploads/2015/07/Example_data.csv",dec=",",sep=";")

#Tukey test

 TukeyHSD(aov(Sales~Area,data=data2))
 plot(TukeyHSD(aov(Sales~Area,data=data2)))