## Tukey test

The Tukey test is a way of com­par­ing group means after an ANOVA, which has shown that there is a sig­nif­i­cant dif­fer­ence between any of the means.

You need to check the fol­low­ing assump­tions before pro­ceed­ing with this kind of Tukey test:

1. Sam­ple sizes (n) are equal

The goal of the Tukey test is to:

Com­pare the group means by test­ing a set of null-hypothe­ses that the dif­fer­ence between two means is zero. This is accom­plished by cal­cu­lat­ing the T-sta­tis­tic, which here is used as a crit­i­cal val­ue to which all mean dif­fer­ences are com­pared:

The T-sta­tis­tic is cal­cu­lat­ed as: where $q$ is a val­ue found in a table of the Tukey dis­tri­b­u­tion for $a$ groups with degrees of free­dom $v$  ( $n-1$), $MS_W$ is the with­in vari­ance or Means Square With­in and $n$ is the sam­ple size with­in each group.

Next, com­pare the T-sta­tis­tic to all pos­si­ble mean dif­fer­ences.

Mean dif­fer­ences with four groups: If any dif­fer­ence is larg­er than the T-sta­tis­tic, there is a sig­nif­i­cant dif­fer­ence between these two means, and the asso­ci­at­ed null-hypoth­e­sis should be reject­ed.

The Tukey test takes into account that sev­er­al com­par­isons are made, which oth­er­wise would have increased the risk of type 1 error. That means that we would have wrong­ly reject­ed the null-hypoth­e­sis in 5 times of 100.

Exam­ple

We used the same exam­ple as for the ANOVA. A com­pa­ny wants to find out if there is a dif­fer­ence in total sales between four geo­graph­i­cal areas. There are 12 shops in each area, thus giv­ing a total of 12 total sales per year (mil­lion dol­lars) for each area (Area 1-Area 4).

The ANOVA found a sig­nif­i­cant dif­fer­ence between the means (F3,44 =87.42, P<0.05). Now we want to find out which means that dif­fer. We there­fore car­ry out a Tukey test.

1. Cal­cu­late the dif­fer­ence between the means: 2. Com­pute the T-sta­tis­tic: 3. Com­pare the dif­fer­ences with the T-sta­tis­tic:

The only dif­fer­ences that exceeds the T-sta­tis­tic are $d_2$, $d_4$ and $d_5$. That is those means that are com­pared to the groups Area 3. The mean of Area 3 is larg­er than all oth­ers.

3. Inter­pret the result:

We are 95 % cer­tain that the mean of Area 3 is larg­er than the means of the oth­er Areas.

How to do it in R

```#Import the data