## One-tailed Z-test

The Z-test is used when you want to compare the means of two large samples (>30 observations). In a one-sided test, you test the null hypothesis (H0): µ1 ≥ µ2 or µ1 ≤ µ2. That is that the mean of pop1 is larger/smaller or equal to the mean of pop2.

The difference with a two sided Z-test is thus that we have a reason to believe that the mean of one of the populations (e.g. pop1) is larger or smaller than the other. For example, you want to test if the mean of pop1 is larger than the mean of pop2. The alternative hypothesis (H1) is then µ1 > µ2, the mean of pop1 is larger than the mean of pop2. The null-hypothesis covers all other possibilities, µ1 ≤ µ2; the mean of pop1 is equal to or smaller than the mean of pop2.

You need to check the following assumptions before proceeding with the Z-test:

1. The observations are independent
2. The samples have the same variance
3. That the Central Limit Theorem holds true (it does if the sample sizes > 30)

The Z-test relies on the test statistic Z, which is calculated by:

where  $\overline{x}_1$ and $\overline{x}_2$ are the means,  $s_1^2$ and $s_2^2$ are the variances, and $n_1$ and $n_2$  are the sample sizes of sample 1 and 2, respectively.

The null hypothesis is rejected when Z > 1.645 and 2.33 at a significance level of α = 0.05 and 0.01, respectively. That is, you are certain at a degree of 95 and 99 %, respectively, that the null-hypothesis can be rejected; e.g. that the mean of pop1 is larger than the mean of pop2. If Z is negative and beyond the limit of the critical value, the null-hypothesis can’t be rejected as the mean of pop2 is larger.

Example

You want to test if the monthly salaries of industrial workers are higher in India compared to China.

1. Construct the null-hypothesis

H0: the mean salary in India is equal to or less than the mean salary in China and (µIndia ≤ µChina)

Take a random sample of at least 30 salaries of industrial workers from China and India

1. Calculate the mean ($\overline{x}$ ) and variance ($s^2$ ) for each sample, in this case:

3. Check that the variances are equal
– Perform a F-test, calculate the F statistic

– Calculate the degrees of freedom ($v$)
$v_{china} = 45 - 1 = 44$
$v_{india} = 32 - 1 = 31$

– Check the critical value for F at α=0.05 where v1 = 44 and v2 = 31 in a    table of critical F values: Fα=0.05 = 1.8-2.01

– Compare the calculated F statistic with Fα=0.05

F < Fα=0.05 = 1.14 < 1.8

– Reject H0 or H

H0 can’t be rejected; the assumption of equal variances holds true.

4. Calculate the Z statistic:

– Look up the critical value for Z at α = 0.05

In the case of critical values for Z we don’t have to check a table it is simply Zα=0.05 = 1.645 for a one-sided Z test, independent of the degrees of freedom of the samples as the test relies on the Central Limit Theorem.

– Compare the calculated Z statistic with Zα=0.05

Z > Zα=0.05 = -12.3 < 1.645

5. Reject H0 or H1

H0 can’t be rejected; the mean salary is not higher in India compared to china (µIndia < µChina)

We are less than 95 % certain that the salaries of industrial workers in India are higher compared to China.

6. Interpret the result

The mean salary is not higher in India compared to china.

How to do it in R


#Function to calculate the Z statistic

z&lt;-function(x1,x2,s1,s2,n1,n2)
(x1-x2)/sqrt(s1/n1+s2/n2)
}

z(3.5,7.8,2.4,2.1,45,32)


### One tailed Z test in depth

The one-tailed Z-test works just like the two-tailed except that the borders under the normal curve of differences (d) between the means are altered. The only interest is if one of the means is larger or smaller than the other, depending on the alternative hypothesis (H1). Therefore, the null-hypothesis is only rejected if the calculated Z value exceeds the critical Z value in one of the tails of the normal distribution of d’s. In order words, the null-hypothesis can’t be rejected if the calculated Z-value falls within the opposite tail. Then, to set α=0.05 it is necessary to move the border of the critical Z value:

95 and 99 % of the d values found from one tail over the centre of the distribution are found at Z = ± 1.645 and 2.33, respectively. Beyond these borders 5 and 1 %, respectively, of the rest of the d values are found, gathered in one tail.

Now, the null-hypothesis can, for example, be expressed as d ≤ 0. In this case, H0 can only be rejected if d is statistical significantly larger than 0 (d>0). This is the case when Z >1.645 or 2.33 depending on α.

### How to produce the graph in R

mnorm_plot<-function(my,sigma){
x<-seq((my-(4*sigma)),(my+(4*sigma)),0.05)
mnorm<-function(my, sigma,x){
y<-(1/(sigma*sqrt(2*pi)))*exp(-0.5*((x-my)/sigma)^2)
y
}

p<-matrix(ncol=1,nrow=length(x))

for(i in 1:length(x)) {
p[i]<-mnorm(my,sigma,x[i])
}

plot(x,p,ylab="",xlab="Difference between means (d)",type="n",las=1,bty="l",pch=19,yaxt="n")

lines(x,p,type="l",lwd=1.5)
abline(0,0,col="grey")
segments(my,0.00,my,max(p),col="grey")

K.U<-my+(1.645*sigma)

cord.x2<-c(K.U,seq(K.U,max(x),0.01),max(x))
cord.y2<-c(0,mnorm(my,sigma,seq(K.U,max(x),0.01)),0)

polygon(cord.x2,cord.y2,col="#C20000",border="black")
}
mnorm_plot(0,0.35)