The confidence interval

The confidence interval is a basic concept that sets the foundation for statistical tests. With this tool in your hands, you can compare the means of two samples.

The true value of a parameter such as the mean is found within the confidence interval with a probability chosen by you. The higher probability that the true parameter value is within the interval, the larger the interval needs to be. The width of the confidence interval is determined by the standard error (which gives you the precision of your estimate) and z:

Confidence interval

where CI is the confidence interval, \overline{x} is the estimated mean, z is the number of standard errors (SE) from the mean that the limit of the confidence interval reaches.

The probability that the true mean is found within this interval is determined by the value of z. To be certain that the true mean is found within the interval with a probability of 0.95 and 0.99, the z – value is set to 1.96 and 2.58, respectively. Refer to the section about the normal distribution. You call these intervals for the 95 and 99 % confidence intervals. The 95 % interval is most commonly used.

Example

Calculate the 95 and 99 % confidence interval where the mean is 50 mm and the standard error is 0.02 mm.

Use the equation for the confidence interval:

CI95 %= 50 ±1.96×0.02=50 ±0.039

CI99 %= 50 ±2.58×0.02=50 ±0.052

More in depth

Recall the Central Limit Theorem. This concept says that all possible means that you can estimate from a population taking endless number of samples conforms to a normal distribution. The standard deviation of this population is the standard error.

Remember that 95 and 99 % of the units within a normally distributed population is found at 1.96 and 2.58 standard deviations from the mean, respectively. So, 95 % of all possible means that can be estimated from a particular population are found 1.96 standard errors from the true mean of the population. Similarly, there is a 95 % chance that the true mean of the population is found 1.96 standard errors from a mean that is estimated from a sample.

It can be illustrated by bringing up the population of all possible means that can be estimated based on samples from the standard error section:

confidence interval graph

In this figure, the estimated mean from a sample (black dot on the x-axis) is found within the region (closer than 1.96 SE from the true mean) of the most likely means that can be estimated from this population. Therefore, the true mean is found within the confidence interval.

confidence interval graph2

In the next figure, however, the estimated mean from the sample is found outside the region (1.96 SE from the true mean). In this case, the true mean is not found within the confidence interval. The probability of this is 0.05 or 5 %, since the filled areas in total constitutes 5 % of all possible means that can be estimated based on samples from the population in question.

How to produce the graphs in this article in R

mnorm_plot<-function(my,sigma,EM){    #my = True mean, sigma = standard deviation and EM = Estimated mean
	x<-seq((my-(4*sigma)),(my+(4*sigma)),0.05)
	mnorm<-function(my, sigma,x){
		y<-(1/(sigma*sqrt(2*pi)))*exp(-0.5*((x-my)/sigma)^2)
		y
	}

	p<-matrix(ncol=1,nrow=length(x))

	for(i in 1:length(x)) {
		p[i]<-mnorm(my,sigma,x[i])
	}

	plot(x,p,ylab="",xlab="Estimated Means",type="n",las=1,bty="l",pch=19,yaxt="n")

	lines(x,p,type="l",lwd=1.5)
	abline(0,0,col="grey")
	segments(my,0.00,my,max(p),col="grey")
	mtext("µ",side=3,line=-0.5)
	K.L<-my-(1.96*sigma)
	K.U<-my+(1.96*sigma)
	cord.x<-c(min(x),seq(min(x),K.L,0.01),K.L)
	cord.y<-c(0,mnorm(my,sigma,seq(min(x),K.L,0.01)),0)
	cord.x2<-c(K.U,seq(K.U,max(x),0.01),max(x))
	cord.y2<-c(0,mnorm(my,sigma,seq(K.U,max(x),0.01)),0)
	polygon(cord.x,cord.y,col="#C20000",border="black")
	polygon(cord.x2,cord.y2,col="#C20000",border="black")

	points(EM,0,col="black",pch=19,cex=2)
	segments(c(EM,EM),c(0,0),c((EM+(1.96*sigma)),(EM-(1.96*sigma))),c(0,0),lwd=2,col="black")
	segments(c((EM-(1.96*sigma)),(EM+(1.96*sigma))),c(-0.01,-0.01),c((EM-(1.96*sigma)),(EM+(1.96*sigma))),
	c(0.01,0.01),col="black",lwd=2)
}

mnorm_plot(50,0.88,49)
x11()
mnorm_plot(50,0.88,48)