## One-tailed Z-test

The Z-test is used when you want to com­pare the means of two large sam­ples (>30 obser­va­tions). In a one-sided test, you test the null hypoth­e­sis (H0): µ1 ≥ µ2 or µ1 ≤ µ2. That is that the mean of pop1 is larger/smaller or equal to the mean of pop2.

The dif­fer­ence with a two sided Z-test is thus that we have a rea­son to believe that the mean of one of the pop­u­la­tions (e.g. pop1) is larg­er or small­er than the oth­er. For exam­ple, you want to test if the mean of pop1 is larg­er than the mean of pop2. The alter­na­tive hypoth­e­sis (H1) is then µ1 > µ2, the mean of pop1 is larg­er than the mean of pop2. The null-hypoth­e­sis cov­ers all oth­er pos­si­bil­i­ties, µ1 ≤ µ2; the mean of pop1 is equal to or small­er than the mean of pop2.

You need to check the fol­low­ing assump­tions before pro­ceed­ing with the Z-test:

1. The obser­va­tions are inde­pen­dent
2. The sam­ples have the same vari­ance
3. That the Cen­tral Lim­it The­o­rem holds true (it does if the sam­ple sizes > 30)

The Z-test relies on the test sta­tis­tic Z, which is cal­cu­lat­ed by:

where  $\overline{x}_1$ and $\overline{x}_2$ are the means,  $s_1^2$ and $s_2^2$ are the vari­ances, and $n_1$ and $n_2$  are the sam­ple sizes of sam­ple 1 and 2, respec­tive­ly.

The null hypoth­e­sis is reject­ed when Z > 1.645 and 2.33 at a sig­nif­i­cance lev­el of α = 0.05 and 0.01, respec­tive­ly. That is, you are cer­tain at a degree of 95 and 99 %, respec­tive­ly, that the null-hypoth­e­sis can be reject­ed; e.g. that the mean of pop1 is larg­er than the mean of pop2. If Z is neg­a­tive and beyond the lim­it of the crit­i­cal val­ue, the null-hypoth­e­sis can’t be reject­ed as the mean of pop2 is larg­er.

Exam­ple

You want to test if the month­ly salaries of indus­tri­al work­ers are high­er in India com­pared to Chi­na.

1. Con­struct the null-hypoth­e­sis

H0: the mean salary in India is equal to or less than the mean salary in Chi­na and (µIndia ≤ µChi­na)

Take a ran­dom sam­ple of at least 30 salaries of indus­tri­al work­ers from Chi­na and India

1. Cal­cu­late the mean ($\overline{x}$ ) and vari­ance ($s^2$ ) for each sam­ple, in this case:

3. Check that the vari­ances are equal
— Per­form a F-test, cal­cu­late the F sta­tis­tic

— Cal­cu­late the degrees of free­dom ($v$)
$v_{china} = 45 - 1 = 44$
$v_{india} = 32 - 1 = 31$

- Check the crit­i­cal val­ue for F at α=0.05 where v1 = 44 and v2 = 31 in a    table of crit­i­cal F val­ues: Fα=0.05 = 1.8–2.01

- Com­pare the cal­cu­lat­ed F sta­tis­tic with Fα=0.05

F < Fα=0.05 = 1.14 < 1.8

- Reject H0 or H

H0 can’t be reject­ed; the assump­tion of equal vari­ances holds true.

4. Cal­cu­late the Z sta­tis­tic:

- Look up the crit­i­cal val­ue for Z at α = 0.05

In the case of crit­i­cal val­ues for Z we don’t have to check a table it is sim­ply Zα=0.05 = 1.645 for a one-sided Z test, inde­pen­dent of the degrees of free­dom of the sam­ples as the test relies on the Cen­tral Lim­it The­o­rem.

- Com­pare the cal­cu­lat­ed Z sta­tis­tic with Zα=0.05

Z > Zα=0.05 = -12.3 < 1.645

5. Reject H0 or H1

H0 can’t be reject­ed; the mean salary is not high­er in India com­pared to chi­na (µIndia < µChi­na)

We are less than 95 % cer­tain that the salaries of indus­tri­al work­ers in India are high­er com­pared to Chi­na.

6. Inter­pret the result

The mean salary is not high­er in India com­pared to chi­na.

How to do it in R


#Function to calculate the Z statistic

z&lt;-function(x1,x2,s1,s2,n1,n2)
(x1-x2)/sqrt(s1/n1+s2/n2)
}

z(3.5,7.8,2.4,2.1,45,32)


### One tailed Z test in depth

The one-tailed Z-test works just like the two-tailed except that the bor­ders under the nor­mal curve of dif­fer­ences (d) between the means are altered. The only inter­est is if one of the means is larg­er or small­er than the oth­er, depend­ing on the alter­na­tive hypoth­e­sis (H1). There­fore, the null-hypoth­e­sis is only reject­ed if the cal­cu­lat­ed Z val­ue exceeds the crit­i­cal Z val­ue in one of the tails of the nor­mal dis­tri­b­u­tion of d’s. In order words, the null-hypoth­e­sis can’t be reject­ed if the cal­cu­lat­ed Z-val­ue falls with­in the oppo­site tail. Then, to set α=0.05 it is nec­es­sary to move the bor­der of the crit­i­cal Z val­ue:

95 and 99 % of the d val­ues found from one tail over the cen­tre of the dis­tri­b­u­tion are found at Z = ± 1.645 and 2.33, respec­tive­ly. Beyond these bor­ders 5 and 1 %, respec­tive­ly, of the rest of the d val­ues are found, gath­ered in one tail.

Now, the null-hypoth­e­sis can, for exam­ple, be expressed as d ≤ 0. In this case, H0 can only be reject­ed if d is sta­tis­ti­cal sig­nif­i­cant­ly larg­er than 0 (d>0). This is the case when Z >1.645 or 2.33 depend­ing on α.

### How to produce the graph in R

mnorm_plot<-function(my,sigma){
x<-seq((my-(4*sigma)),(my+(4*sigma)),0.05)
mnorm<-function(my, sigma,x){
y<-(1/(sigma*sqrt(2*pi)))*exp(-0.5*((x-my)/sigma)^2)
y
}

p<-matrix(ncol=1,nrow=length(x))

for(i in 1:length(x)) {
p[i]<-mnorm(my,sigma,x[i])
}

plot(x,p,ylab="",xlab="Difference between means (d)",type="n",las=1,bty="l",pch=19,yaxt="n")

lines(x,p,type="l",lwd=1.5)
abline(0,0,col="grey")
segments(my,0.00,my,max(p),col="grey")

K.U<-my+(1.645*sigma)

cord.x2<-c(K.U,seq(K.U,max(x),0.01),max(x))
cord.y2<-c(0,mnorm(my,sigma,seq(K.U,max(x),0.01)),0)

polygon(cord.x2,cord.y2,col="#C20000",border="black")
}
mnorm_plot(0,0.35)