[porto_container animation_duration=“1000” animation_delay=“0”][/porto_container]

Z‑score

As we saw in the arti­cle about the nor­mal dis­tri­b­u­tion, the num­ber of stan­dard devi­a­tions from the mean is an impor­tant aspect of the nor­mal dis­tri­b­u­tion. How can we express this? Well, we can say that the num­ber of stan­dard devi­a­tions is called z. If we put in some let­ters in the cal­cu­la­tion of any inter­val you get the fol­low­ing equa­tion if you are deal­ing with the whole population:

where $\mu$ is the mean, $z$ is the num­ber of stan­dard devi­a­tions from the mean and $\sigma$ is the stan­dard deviation.

You get the foll­wing equa­tion if you are deal­ing with a sample:

where $\overline{x}$ is the mean, $z$ is the num­ber of stan­dard devi­a­tions from the mean and $s$ is the stan­dard deviation.

The only dif­fer­ence between the equa­tions is the sym­bols used for the mean and the stan­dard deviation.

An inter­est­ing prop­er­ty of this equa­tion is that the mean and stan­dard devi­a­tion are con­stant; there is only one mean and one stan­dard devi­a­tion for a pop­u­la­tion. But, the z val­ue varies over the x‑axis of the nor­mal dis­tri­b­u­tion. That is because it rep­re­sents a real val­ue with­in the dis­tri­b­u­tion.  So the z‑value can be cal­cu­lat­ed for every sin­gle val­ue with­in the pop­u­la­tion. That is the dis­tance to the mean expressed in the num­ber of stan­dard devi­a­tions. Rear­rang­ing the equa­tion we get:

where $z$ is the num­ber of stan­dard devi­a­tions from the mean, $x$ is a val­ue with­in the dis­tri­b­u­tion (or obser­va­tion),$\overline{x}$ is the mean, and $s$ is the stan­dard deviation.

This is also called the z‑score. It is pos­i­tive or neg­a­tive depend­ing on what side of the mean in the dis­tri­b­u­tion the obser­va­tion val­ue is found. Why is this use­ful? Well, cal­cu­lat­ing this val­ue gives you infor­ma­tion about with which prob­a­bil­i­ty the observed val­ue belongs to the population.

Exam­ple

You have a friend that has tracked his suc­cess on the run­ning track for sev­er­al years. He has lots of data on the time it takes for him to rum 10 km. These times dif­fer between occa­sions and are nor­mal­ly dis­trib­uted around the mean 50 min­utes (s = 2.6). You have also trained for a long time, but have not been as seri­ous as your friend keep­ing track of the time. But final­ly you want to see if you can keep up. As a true sta­tis­ti­cian, you decide to do this with a fun exper­i­ment. You go out for a run and keep track of the time it takes for you to cov­er 10 km. You cov­er the dis­tance on 56 min­utes. This is obvi­ous­ly slow­er than your friends mean, but is it a time he could as well get when com­par­ing with the whole pop­u­la­tion of times on 10 km? The next step is to cal­cu­late the z‑score of your time. How far from your friends mean is it real­ly? So using the equa­tion for the z‑score you get:

$z = \frac{(56-50)}{2.6} = 2.3$

From this you can say that your time is found 2.3 stan­dard devi­a­tions away from your friends mean. The prob­a­bil­i­ty that your friend gets this time is less than 5 % as the z‑score is more than 1.96 stan­dard devi­a­tions away from the mean. But what can you con­clude from this? You can say that it is not very like­ly that your friend is this slow. Per­haps he got this time on a bad day; per­haps he had a cold at the occasion(s) he record­ed it. Did you? Were you on your prime? These ques­tions rather points to the fact that you want to inves­ti­gate this fur­ther, record­ing more times.  But this would mean that you are deal­ing with lots of times from two pop­u­la­tions. How do you com­pare these? Have a look at the z‑test when­ev­er you feel ready for it. But I sug­gest you have a look at the stan­dard error and con­fi­dence inter­val sec­tions first.

How to do it in R

#Function for calculating the Z-score

z<-function(x,mean,sd)(x-mean)/sd

#Calculating the z-score

z(56,50,2.6)