Mean, median and mode

The mean, median and mode are measures of location in statistics. They are used to describe a population or sample and are therefore also important terms in statistical tests. In normal distributions the mean, median and mode are equal.

Usually the mean and median are used to describe a population or sample. Which one to use for the description of your data depends on the statistical test you decide to use. Parametric methods test for differences in the means of groups whereas non-parametric methods test for differences in medians. The choice of test depends on the distribution and variability of the units within the population or sample.

Mean

The mean is the average value of all values in the population or sample. The symbol for the mean depends on if you have values for the entire population (\mu) or a sample (\overline{x}). The reason is that the mean of a sample is an estimate of the real population value. They need to be distinguished for clarity.

The mean for the entire population is calculated as:

mean population

where \mu is the true mean of the population, x is unit in the population and N is the number of units in the population.

The mean of a sample is estimated as:

mean sample

where \overline{x} is the mean, x is an observation of a unit and n is the number of all units in the sample.

Example

Calculate the mean of the following data set: 7, 8, 4, 0, 4, 5, 12, 3, 3

(1) Calculate the sum of all values = \sum x = 46
(2) Divide the sum with the total number of values (n = 9)

\frac{46}{9} = 5.11

How to do it in R

data<-c(7, 8, 4, 0, 4, 5, 12, 3, 3)
mean(data)

Median

The median is the middle number of the sample or population when ordered from lowest to highest. 50 % of the data falls above the median, and 50 % of the rest of the data falls below the median. Calculate the median by (1) order the values at hand from lowest to highest and (2) the number in the middle is the median.

Example

Calculate the median of the following data set: 7, 8, 4, 0, 4, 5, 12, 3, 3

(1) Arrange the values in ascending order: 0, 3, 3, 4, 4, 5, 7, 8, 12
(2) Take the middle value: 0, 3, 3, 4, 4, 5, 7, 8, 12 = 4

The median is 4

How to do it in R

data<-c(7, 8, 4, 0, 4, 5, 12, 3, 3)
median(data)

Note that the median is lower than the mean for the same data set. That is because the median does not take the magnitude of the values in to account as the mean, but only the position within the array of numbers. The median is the same irrespective of the values above and below the middle value whereas the mean can change drastically if just one value is replaced by either a higher or lower one.

Mode

The mode is the most frequent value in a sample or population. If there is no single value that is most frequent, the data is bimodal or multimodal. In other words there are more than one mode.

Example

Calculate the mode of the following data set: 7, 8, 0, 4, 5, 12, 3, 3

(1) Take the most common value = 3 and 4

The data has two modes and is therefore bimodal.