The normal distribution

The normal distribution display specific characteristics that set the basis for parametric test. In the end it comes down to the possibility to calculate the standard error. If you understand how the mean and standard error works together within the normal curve, you are on your way to really understand the theory behind hypothesis testing using parametric tests.

Mathematical characteristics

The normal curve is symmetric centering around the mean. The value of the mean, median and mode is exactly the same in a perfect normal distribution because of this symmetry. The normal distribution possesses mathematical properties that are very useful in hypothesis testing using parametric tests:

(1) 68 % of all values in the population or sample falls within 1 standard deviation from the mean. That means that if the mean is 35 and the standard deviation is 2, you’ll find 68 % of the values of the distribution within the interval 35 ± 2. In other words between 33 and 37.

(2) About 95 % of the values within the distribution are found within 1.96 standard deviations from the mean. That is, you’ll find 95 % of the values within the interval 35 ± 1.96 × 2 = 35 ± 3.92 = 31.08 to 38.92.

(3) About 99 % of the values are found 2.58 standard deviations from the mean. That is, you’ll find 99 % of the values within the interval 35 ± 2.58 × 2 = 35 ± 5.16 = 29.84 to 40.16.

(4) Almost the entire population (99.7 %) lies within the distance of 3 standard deviations from the mean. I guess you know how to calculate this interval by now.

Normal distribution

What this says is that the probability that a value of the normal distributed variable belongs to the population is 0.05 if it is found 1.96  standard deviations from the mean. And 0.01 if it is found 2.58 standard deviations from the mean. It also says that only a fraction of 0.05 and 0.01 of the values in the normal distribution are found 1.96 and 2.58 standard deviations from the mean respectively.

Do you start to grasp in what way this has to do with statistical tests? If not, you’ll get it soon enough. To really get the whole picture you need to read and understand the section about standard error and confidence intervals.

Important to remember

(1) The normal distribution is symmetric and possesses specific mathematical properties that enable you to calculate the probability that a specific value belongs to the population.

(2) Understanding the properties of the normal distribution is the first step in understanding the theory behind parametric tests.

R code for the normal distribution graph

mnorm_plot<-function(my,sigma){
 	x<-seq((my-(4*sigma)),(my+(4*sigma)),0.05)

		mnorm<-function(my, sigma,x){
			y<-(1/(sigma*sqrt(2*pi)))*exp(-0.5*((x-my)/sigma)^2)
			y
		}

	p<-matrix(ncol=1,nrow=length(x))

	for(i in 1:length(x)) {
		p[i]<-mnorm(my,sigma,x[i])
	}

	K.L<-my-(1.96*sigma)
	K.U<-my+(1.96*sigma)

	plot(x,p,ylab="",xlab="Z-Score = Nr of sd from mean",type="n",las=1,bty="l",pch=19,yaxt="n",xaxt="n")
	axis(side=1,at=c(K.L,my,K.U),labels = c(-1.96,0,1.96),pos=0,las=1,tick=F)

	lines(x,p,type="l",lwd=1.5)
	abline(0,0,col="grey")
	segments(my,0.00,my,max(p),col="grey")
	mtext("µ",side=3,line=-0.5)

	cord.x<-c(min(x),seq(min(x),K.L,0.01),K.L)
	cord.y<-c(0,mnorm(my,sigma,seq(min(x),K.L,0.01)),0)
	cord.x2<-c(K.U,seq(K.U,max(x),0.01),max(x))
	cord.y2<-c(0,mnorm(my,sigma,seq(K.U,max(x),0.01)),0)
	polygon(cord.x,cord.y,col="#C20000",border="black")
	polygon(cord.x2,cord.y2,col="#C20000",border="black")

	text(my,max(p)/2,"95 %",font=2)
	text(my-(2.3*sigma),max(p)/18,"2.5 %",font=2)
	text(my+(2.3*sigma),max(p)/18,"2.5 %",font=2)

}

mnorm_plot(50,0.88)