## The standard error

I believe the standard error is one of the most confusing concepts for those that are new in statistics. That is my experience anyway as teaching students in basic statistics. That is because it is very easy to mix it up with the standard deviation. It is understandable, since the standard error is a type of standard deviation.

Simply speaking, the standard error is a measure of how good you have estimated a population parameter such as the mean. A small value means that an estimate is of high precision, and a large value means that an estimate is of low precision. The standard deviation on the other hand is a measure of variability within the population/sample. Besides providing an indication of how good your estimate is, the standard error is used to produce a confidence interval. The true estimate is found within that interval with a specific probability (usually 0.95).

Use this equation to calculate the standard error of a mean:

where * *is the standard error, is the standard deviation of a sample and is the number of observations/units within the sample.

As you can see the standard error is a function of sample size (); larger samples results in smaller standard errors and thus higher precision of the mean. Its logical. When getting a larger set of observations from a population, the more likely you are to come in close range of the true mean. Take a look at the plot below where the standard deviation remains the same, but the number of observations () varies:

**Example**

Calculate the standard error of a mean where s = 0.2 and the total number of observations (n) is 10.

Use the equation for the standard error of a mean:

Answer: The standard error is 0.06

### More in depth

**Central limit theorem**

Ok, now let’s confuse things. I want you to truly understand the theory behind the standard error and what it really is.

Think about the following: If you sample a population several times and calculate the mean every time. Do you think you get the same estimate every time? No, you will not. They will vary due to something called **sampling error**. That is not error on your behalf but error due to chance. By chance you will get different estimates of the true mean since you take samples from the population. Now I’ll introduce an interesting phenomenon; all these means you get from taking lots and lots of samples from a population conforms to a normal distribution. This is called the **Central Limit Theorem**.

Say you sample a population, (N=1000) with µ = 50 and σ = 2, a million times and estimate the mean every single time. The sample size is n = 20 in every sample. When you plot the distribution you get:

The estimated means () are centered around the true mean () of the population that we have drawn the samples from. The normal distribution above contains practically all possible estimates of the mean one can get with a sample with n=20 since you draw an infinite number of samples (almost).

What happens if we alter the sample size? Consider the plots below where I have sampled the same population again, but with different sample sizes () in each plot:

Notice that the standard deviation (sd_{MEANS}) decrease with increasing sample size. In other words, the variability within the population of estimated means decreases with higher n. In the example above, an estimated mean from a sample with n=40 will never be less or higher than about 49 and 51, respectively. That is quite a good precision. On the contrary, an estimate can vary between 47 and 53, respectively, when n=5. So, you get an estimate of higher precision with higher sample size. That leads us to the standard error:

**The standard deviation of the population of estimated means is the standard error. **As the sample size increase, the standard error decrease, i.e. the mean is estimated with higher precision.

You don’t have to draw an infinite number of samples from a population to calculate standard error. Just use the equation for the standard error as given in the beginning of this article.

### How to produce the graphs in this article in R

*The central limit theorem single graph*

#Function for the plot H_fun<-function(m,s,n,N,b){ ##m=µ, s= s, n=sample size, N= pop size b=nr of bootstraps x<-rnorm(N, m, s) p <- matrix(ncol=b,nrow=n) for (i in 1:b) { p[,i] <- sample(x, n, replace = T) } a<-apply(p, MARGIN=2, FUN = mean) hist(a, breaks = "Sturges", freq = FALSE, col = "#C20000", main = NULL, ylim = c(0,1),xlim= c(47,53), xlab = "Estimated Means", ylab="Probability",bty="l", las=1, xaxt="n",cex.lab=1.2) axis(side=1,at=c(seq(43,57,2)),labels = c(seq(43,57,2)),pos=0,las=1,tick=T) text(49.5,1.3,"n =",cex=1.2) text(50,1.3,n, cex=1.2) } #Using the function H_fun(50,2,20,1000,1000000)

*The central limit theorem multiple graphs*

#Function for a plot H_fun<-function(m,s,n,N,b){ ##m=µ, s= s, n=sample size, N= pop size b=nr of bootstraps x<-rnorm(N, m, s) p <- matrix(ncol=b,nrow=n) for (i in 1:b) { p[,i] <- sample(x, n, replace = T) } a<-apply(p, MARGIN=2, FUN = mean) SE<-round(sd(a),2) hist(a, breaks = "Sturges", freq = FALSE, col = "#C20000", main = bquote("n ="~.(n)~"|"~sd[MEANS]~"="~.(SE)), ylim = c(0,1),xlim= c(47,53), xlab = "Estimated Means", ylab="Probability",bty="l", las=1, xaxt="n",cex.lab=1.2) axis(side=1,at=c(seq(43,57,2)),labels = c(seq(43,57,2)),pos=0,las=1,tick=T) text(49.5,1.3,"n =",cex=1.2) text(50,1.3,n, cex=1.2) } #Plotting par(mfcol=c(2,2)) H_fun(50,2,5,1000,1000000) H_fun(50,2,10,1000,1000000) H_fun(50,2,20,1000,1000000) H_fun(50,2,40,1000,1000000)