Mann-Whitney U-test

The Mann-Whitney U-test is a non-parametric test that is used to compare the medians of two populations. You test the null-hypothesis that M1 = M2. Contrary to parametric tests, the Mann-Whitney U-test do not use the original values in the calculation of the test statistic but rather the ranks of the values.

You need to check the following assumptions before proceeding with the t-test:

  1. The observations are independent

Before proceeding with the test calculation of the statistic, the original values must be converted to ranks.

The Mann-Whitney U-test relies on the test statistic U1 and U2, which are calculated by:

U_1 = n_1n_2 + \frac{n_2(n_2+1}){ 2} - R_2

U_2 = n_1n_2 + \frac{n_1(n_1+1}){ 2} - R_2

where n1 and n2 are the sample sizes of sample 1 and 2, respectively, and R1 and R2 are the sum of the ranks of each sample. The smallest value of U1 and U2 is selected as the test statistic to be compared with the critical value that is found in a table.

If your calculations are right U1 + U2 = n1n2

If the calculated test statistic is less than the critical, the null-hypothesis is rejected.

Example

You want to see if the density of a specific plant species differs between the two habitats forest and meadow.

1. Construct the null-hypothesis

H0: The median number of plants per m2 is the same within a forest and a meadow (M_{FOREST} = M_{MEADOW} )

2. Do the experiment

You go out and count the number of plants per m2 in a number of frames in each type of habitat.

3. Sort the samples from each habitat and calculate the median of each sample, in this case:

U-test

4. Rank the observations:

U-test2

5. Sum the ranks for each sample

RFOREST = 57.5

RMEADOW = 113.5
6. Calculate the test statistics UFOREST and UMEADOW

UFOREST = 12.5

UMEADOW = 68.5

7. UFOREST + UMEADOW = nFOREST x nMEADOW = 8
8. Look up the critical value for U at α = 0.05

We check the t-table at n = 9 where α = 0.05. There we find that U = 17
9. Compare the calculated U statistic with Uα=0.05

We always choose the smallest U statistic from our calculations, which UFOREST = 12.5

U < Uα=0.05 = 12.5 < 17
10. Reject H0 or H1

H0 can be rejected; the median density between forest and meadow differ (MFOREST ≠ MMEADOW)
11. Interpret the result

We are more than 99 % certain that the density of plants is higher within the meadow compared to the forest.

How to do it in R

#1. Import the data

	data<-read.csv("http://www.ilovestats.org/wp-content/uploads/2015/08/U-test1.csv",dec=",",sep=";")

#2. Run the test
	wilcox.test(data$Forest, data$Meadow)