## Z-score

As we saw in the article about the normal distribution, the number of standard deviations from the mean is an important aspect of the normal distribution. How can we express this? Well, we can say that the number of standard deviations is called z. If we put in some letters in the calculation of any interval you get the following equation if you are dealing with the whole population:

where $\mu$ is the mean, $z$ is the number of standard deviations from the mean and $\sigma$ is the standard deviation.

You get the follwing equation if you are dealing with a sample:

where $\overline{x}$ is the mean, $z$ is the number of standard deviations from the mean and $s$ is the standard deviation.

The only difference between the equations is the symbols used for the mean and the standard deviation.

An interesting property of this equation is that the mean and standard deviation are constant; there is only one mean and one standard deviation for a population. But, the z value varies over the x-axis of the normal distribution. That is because it represents a real value within the distribution.  So the z-value can be calculated for every single value within the population. That is the distance to the mean expressed in the number of standard deviations. Rearranging the equation we get:

where $z$ is the number of standard deviations from the mean, $x$ is a value within the distribution (or observation),$\overline{x}$ is the mean, and $s$ is the standard deviation.

This is also called the z-score. It is positive or negative depending on what side of the mean in the distribution the observation value is found. Why is this useful? Well, calculating this value gives you information about with which probability the observed value belongs to the population.

Example

You have a friend that has tracked his success on the running track for several years. He has lots of data on the time it takes for him to rum 10 km. These times differ between occasions and are normally distributed around the mean 50 minutes (s = 2.6). You have also trained for a long time, but have not been as serious as your friend keeping track of the time. But finally you want to see if you can keep up. As a true statistician, you decide to do this with a fun experiment. You go out for a run and keep track of the time it takes for you to cover 10 km. You cover the distance on 56 minutes. This is obviously slower than your friends mean, but is it a time he could as well get when comparing with the whole population of times on 10 km? The next step is to calculate the z-score of your time. How far from your friends mean is it really? So using the equation for the z-score you get:

$z = \frac{(56-50)}{2.6} = 2.3$

From this you can say that your time is found 2.3 standard deviations away from your friends mean. The probability that your friend gets this time is less than 5 % as the z-score is more than 1.96 standard deviations away from the mean. But what can you conclude from this? You can say that it is not very likely that your friend is this slow. Perhaps he got this time on a bad day; perhaps he had a cold at the occasion(s) he recorded it. Did you? Were you on your prime? These questions rather points to the fact that you want to investigate this further, recording more times.  But this would mean that you are dealing with lots of times from two populations. How do you compare these? Have a look at the z-test whenever you feel ready for it. But I suggest you have a look at the standard error and confidence interval sections first.

How to do it in R

#Function for calculating the Z-score

z<-function(x,mean,sd)(x-mean)/sd

#Calculating the z-score

z(56,50,2.6)