Mean, median and mode

The mean, medi­an and mode are mea­sures of loca­tion in sta­tis­tics. They are used to describe a pop­u­la­tion or sam­ple and are there­fore also impor­tant terms in sta­tis­ti­cal tests. In nor­mal dis­tri­b­u­tions the mean, medi­an and mode are equal.

Usu­al­ly the mean and medi­an are used to describe a pop­u­la­tion or sam­ple. Which one to use for the descrip­tion of your data depends on the sta­tis­ti­cal test you decide to use. Para­met­ric meth­ods test for dif­fer­ences in the means of groups where­as non-para­met­ric meth­ods test for dif­fer­ences in medi­ans. The choice of test depends on the dis­tri­b­u­tion and vari­abil­i­ty of the units with­in the pop­u­la­tion or sam­ple.

Mean

The mean is the aver­age val­ue of all val­ues in the pop­u­la­tion or sam­ple. The sym­bol for the mean depends on if you have val­ues for the entire pop­u­la­tion (\mu) or a sam­ple (\overline{x}). The rea­son is that the mean of a sam­ple is an esti­mate of the real pop­u­la­tion val­ue. They need to be dis­tin­guished for clar­i­ty.

The mean for the entire pop­u­la­tion is cal­cu­lat­ed as:

mean population

where \mu is the true mean of the pop­u­la­tion, x is unit in the pop­u­la­tion and N is the num­ber of units in the pop­u­la­tion.

The mean of a sam­ple is esti­mat­ed as:

mean sample

where \overline{x} is the mean, x is an obser­va­tion of a unit and n is the num­ber of all units in the sam­ple.

Exam­ple

Cal­cu­late the mean of the fol­low­ing data set: 7, 8, 4, 0, 4, 5, 12, 3, 3

(1) Cal­cu­late the sum of all val­ues = \sum x = 46
(2) Divide the sum with the total num­ber of val­ues (n = 9)

\frac{46}{9} = 5.11

How to do it in R

data<-c(7, 8, 4, 0, 4, 5, 12, 3, 3)
mean(data)

Median

The medi­an is the mid­dle num­ber of the sam­ple or pop­u­la­tion when ordered from low­est to high­est. 50 % of the data falls above the medi­an, and 50 % of the rest of the data falls below the medi­an. Cal­cu­late the medi­an by (1) order the val­ues at hand from low­est to high­est and (2) the num­ber in the mid­dle is the medi­an.

Exam­ple

Cal­cu­late the medi­an of the fol­low­ing data set: 7, 8, 4, 0, 4, 5, 12, 3, 3

(1) Arrange the val­ues in ascend­ing order: 0, 3, 3, 4, 4, 5, 7, 8, 12
(2) Take the mid­dle val­ue: 0, 3, 3, 4, 4, 5, 7, 8, 12 = 4

The medi­an is 4

How to do it in R

data<-c(7, 8, 4, 0, 4, 5, 12, 3, 3)
median(data)

Note that the medi­an is low­er than the mean for the same data set. That is because the medi­an does not take the mag­ni­tude of the val­ues in to account as the mean, but only the posi­tion with­in the array of num­bers. The medi­an is the same irre­spec­tive of the val­ues above and below the mid­dle val­ue where­as the mean can change dras­ti­cal­ly if just one val­ue is replaced by either a high­er or low­er one.

Mode

The mode is the most fre­quent val­ue in a sam­ple or pop­u­la­tion. If there is no sin­gle val­ue that is most fre­quent, the data is bimodal or mul­ti­modal. In oth­er words there are more than one mode.

Exam­ple

Cal­cu­late the mode of the fol­low­ing data set: 7, 8, 0, 4, 5, 12, 3, 3

(1) Take the most com­mon val­ue = 3 and 4

The data has two modes and is there­fore bimodal.