The normal distribution

The nor­mal dis­tri­b­u­tion dis­play spe­cif­ic char­ac­ter­is­tics that set the basis for para­met­ric test. In the end it comes down to the pos­si­bil­i­ty to cal­cu­late the stan­dard error. If you under­stand how the mean and stan­dard error works togeth­er with­in the nor­mal curve, you are on your way to real­ly under­stand the the­o­ry behind hypoth­e­sis test­ing using para­met­ric tests.

Math­e­mat­i­cal char­ac­ter­is­tics

The nor­mal curve is sym­met­ric cen­ter­ing around the mean. The val­ue of the mean, medi­an and mode is exact­ly the same in a per­fect nor­mal dis­tri­b­u­tion because of this sym­me­try. The nor­mal dis­tri­b­u­tion pos­sess­es math­e­mat­i­cal prop­er­ties that are very use­ful in hypoth­e­sis test­ing using para­met­ric tests:

(1) 68 % of all val­ues in the pop­u­la­tion or sam­ple falls with­in 1 stan­dard devi­a­tion from the mean. That means that if the mean is 35 and the stan­dard devi­a­tion is 2, you’ll find 68 % of the val­ues of the dis­tri­b­u­tion with­in the inter­val 35 ± 2. In oth­er words between 33 and 37.

(2) About 95 % of the val­ues with­in the dis­tri­b­u­tion are found with­in 1.96 stan­dard devi­a­tions from the mean. That is, you’ll find 95 % of the val­ues with­in the inter­val 35 ± 1.96 × 2 = 35 ± 3.92 = 31.08 to 38.92.

(3) About 99 % of the val­ues are found 2.58 stan­dard devi­a­tions from the mean. That is, you’ll find 99 % of the val­ues with­in the inter­val 35 ± 2.58 × 2 = 35 ± 5.16 = 29.84 to 40.16.

(4) Almost the entire pop­u­la­tion (99.7 %) lies with­in the dis­tance of 3 stan­dard devi­a­tions from the mean. I guess you know how to cal­cu­late this inter­val by now.

What this says is that the prob­a­bil­i­ty that a val­ue of the nor­mal dis­trib­uted vari­able belongs to the pop­u­la­tion is 0.05 if it is found 1.96  stan­dard devi­a­tions from the mean. And 0.01 if it is found 2.58 stan­dard devi­a­tions from the mean. It also says that only a frac­tion of 0.05 and 0.01 of the val­ues in the nor­mal dis­tri­b­u­tion are found 1.96 and 2.58 stan­dard devi­a­tions from the mean respec­tive­ly.

Do you start to grasp in what way this has to do with sta­tis­ti­cal tests? If not, you’ll get it soon enough. To real­ly get the whole pic­ture you need to read and under­stand the sec­tion about stan­dard error and con­fi­dence inter­vals.

Impor­tant to remem­ber

(1) The nor­mal dis­tri­b­u­tion is sym­met­ric and pos­sess­es spe­cif­ic math­e­mat­i­cal prop­er­ties that enable you to cal­cu­late the prob­a­bil­i­ty that a spe­cif­ic val­ue belongs to the pop­u­la­tion.

(2) Under­stand­ing the prop­er­ties of the nor­mal dis­tri­b­u­tion is the first step in under­stand­ing the the­o­ry behind para­met­ric tests.

R code for the nor­mal dis­tri­b­u­tion graph

mnorm_plot<-function(my,sigma){
 	x<-seq((my-(4*sigma)),(my+(4*sigma)),0.05)

		mnorm<-function(my, sigma,x){
			y<-(1/(sigma*sqrt(2*pi)))*exp(-0.5*((x-my)/sigma)^2)
			y
		}

	p<-matrix(ncol=1,nrow=length(x))

	for(i in 1:length(x)) {
		p[i]<-mnorm(my,sigma,x[i])
	}

	K.L<-my-(1.96*sigma)
	K.U<-my+(1.96*sigma)

	plot(x,p,ylab="",xlab="Z-Score = Nr of sd from mean",type="n",las=1,bty="l",pch=19,yaxt="n",xaxt="n")
	axis(side=1,at=c(K.L,my,K.U),labels = c(-1.96,0,1.96),pos=0,las=1,tick=F)

	lines(x,p,type="l",lwd=1.5)
	abline(0,0,col="grey")
	segments(my,0.00,my,max(p),col="grey")
	mtext("µ",side=3,line=-0.5)

	cord.x<-c(min(x),seq(min(x),K.L,0.01),K.L)
	cord.y<-c(0,mnorm(my,sigma,seq(min(x),K.L,0.01)),0)
	cord.x2<-c(K.U,seq(K.U,max(x),0.01),max(x))
	cord.y2<-c(0,mnorm(my,sigma,seq(K.U,max(x),0.01)),0)
	polygon(cord.x,cord.y,col="#C20000",border="black")
	polygon(cord.x2,cord.y2,col="#C20000",border="black")

	text(my,max(p)/2,"95 %",font=2)
	text(my-(2.3*sigma),max(p)/18,"2.5 %",font=2)
	text(my+(2.3*sigma),max(p)/18,"2.5 %",font=2)

}

mnorm_plot(50,0.88)