The confidence interval

The con­fi­dence inter­val is a basic con­cept that sets the foun­da­tion for sta­tis­ti­cal tests. With this tool in your hands, you can com­pare the means of two sam­ples.

The true val­ue of a para­me­ter such as the mean is found with­in the con­fi­dence inter­val with a prob­a­bil­i­ty cho­sen by you. The high­er prob­a­bil­i­ty that the true para­me­ter val­ue is with­in the inter­val, the larg­er the inter­val needs to be. The width of the con­fi­dence inter­val is deter­mined by the stan­dard error (which gives you the pre­ci­sion of your esti­mate) and z:

Confidence interval

where CI is the con­fi­dence inter­val, \overline{x} is the esti­mat­ed mean, z is the num­ber of stan­dard errors (SE) from the mean that the lim­it of the con­fi­dence inter­val reach­es.

The prob­a­bil­i­ty that the true mean is found with­in this inter­val is deter­mined by the val­ue of z. To be cer­tain that the true mean is found with­in the inter­val with a prob­a­bil­i­ty of 0.95 and 0.99, the z — val­ue is set to 1.96 and 2.58, respec­tive­ly. Refer to the sec­tion about the nor­mal dis­tri­b­u­tion. You call these inter­vals for the 95 and 99 % con­fi­dence inter­vals. The 95 % inter­val is most com­mon­ly used.

Exam­ple

Cal­cu­late the 95 and 99 % con­fi­dence inter­val where the mean is 50 mm and the stan­dard error is 0.02 mm.

Use the equa­tion for the con­fi­dence inter­val:

CI95 %= 50 ±1.96×0.02=50 ±0.039

CI99 %= 50 ±2.58×0.02=50 ±0.052

More in depth

Recall the Cen­tral Lim­it The­o­rem. This con­cept says that all pos­si­ble means that you can esti­mate from a pop­u­la­tion tak­ing end­less num­ber of sam­ples con­forms to a nor­mal dis­tri­b­u­tion. The stan­dard devi­a­tion of this pop­u­la­tion is the stan­dard error.

Remem­ber that 95 and 99 % of the units with­in a nor­mal­ly dis­trib­uted pop­u­la­tion is found at 1.96 and 2.58 stan­dard devi­a­tions from the mean, respec­tive­ly. So, 95 % of all pos­si­ble means that can be esti­mat­ed from a par­tic­u­lar pop­u­la­tion are found 1.96 stan­dard errors from the true mean of the pop­u­la­tion. Sim­i­lar­ly, there is a 95 % chance that the true mean of the pop­u­la­tion is found 1.96 stan­dard errors from a mean that is esti­mat­ed from a sam­ple.

It can be illus­trat­ed by bring­ing up the pop­u­la­tion of all pos­si­ble means that can be esti­mat­ed based on sam­ples from the stan­dard error sec­tion:

In this fig­ure, the esti­mat­ed mean from a sam­ple (black dot on the x-axis) is found with­in the region (clos­er than 1.96 SE from the true mean) of the most like­ly means that can be esti­mat­ed from this pop­u­la­tion. There­fore, the true mean is found with­in the con­fi­dence inter­val.

In the next fig­ure, how­ev­er, the esti­mat­ed mean from the sam­ple is found out­side the region (1.96 SE from the true mean). In this case, the true mean is not found with­in the con­fi­dence inter­val. The prob­a­bil­i­ty of this is 0.05 or 5 %, since the filled areas in total con­sti­tutes 5 % of all pos­si­ble means that can be esti­mat­ed based on sam­ples from the pop­u­la­tion in ques­tion.

How to produce the graphs in this article in R

mnorm_plot<-function(my,sigma,EM){    #my = True mean, sigma = standard deviation and EM = Estimated mean
	x<-seq((my-(4*sigma)),(my+(4*sigma)),0.05)
	mnorm<-function(my, sigma,x){
		y<-(1/(sigma*sqrt(2*pi)))*exp(-0.5*((x-my)/sigma)^2)
		y
	}

	p<-matrix(ncol=1,nrow=length(x))

	for(i in 1:length(x)) {
		p[i]<-mnorm(my,sigma,x[i])
	}

	plot(x,p,ylab="",xlab="Estimated Means",type="n",las=1,bty="l",pch=19,yaxt="n")

	lines(x,p,type="l",lwd=1.5)
	abline(0,0,col="grey")
	segments(my,0.00,my,max(p),col="grey")
	mtext("µ",side=3,line=-0.5)
	K.L<-my-(1.96*sigma)
	K.U<-my+(1.96*sigma)
	cord.x<-c(min(x),seq(min(x),K.L,0.01),K.L)
	cord.y<-c(0,mnorm(my,sigma,seq(min(x),K.L,0.01)),0)
	cord.x2<-c(K.U,seq(K.U,max(x),0.01),max(x))
	cord.y2<-c(0,mnorm(my,sigma,seq(K.U,max(x),0.01)),0)
	polygon(cord.x,cord.y,col="#C20000",border="black")
	polygon(cord.x2,cord.y2,col="#C20000",border="black")

	points(EM,0,col="black",pch=19,cex=2)
	segments(c(EM,EM),c(0,0),c((EM+(1.96*sigma)),(EM-(1.96*sigma))),c(0,0),lwd=2,col="black")
	segments(c((EM-(1.96*sigma)),(EM+(1.96*sigma))),c(-0.01,-0.01),c((EM-(1.96*sigma)),(EM+(1.96*sigma))),
	c(0.01,0.01),col="black",lwd=2)
}

mnorm_plot(50,0.88,49)
x11()
mnorm_plot(50,0.88,48)