The product Moment Correlation coefficient

The prod­uct moment cor­re­la­tion coef­fi­cient is a mea­sure of the coree­la­tion between two vari­ables, for exam­ple Height and Shoe size. We will start by explain­ing covari­ance.

Covari­ance

I will use two vari­ables, Height and Shoe size, for the demon­stra­tion of what covari­ance real­ly is. Say that we have the fol­low­ing obser­va­tions of Height and Shoe size:

Now, let’s plot these points:

The dot­ted lines illus­trates the devi­a­tion from the points to the means of Heigth (y) and Shoe size (x).The next step is to see how much each point devi­ates from the means. Each point has both a y and x coor­di­nate, which are mul­ti­plied to get the Prod­uct. You can have three dif­fer­ent types of out­comes when sum­ming the prod­ucts:

  1. Zero; there is no cor­re­la­tion
  2. Pos­i­tive val­ue; there is a pos­i­tive cor­re­la­tion
  3. Neg­a­tive val­ue; there is a neg­a­tive cor­re­la­tion

In this exam­ple, we have pos­i­tive cor­re­la­tion.

To get the covari­ance, the aver­age prod­uct of the coor­di­nates of a point; divide with the degrees of free­dom . In this exam­ple the covari­ance is:

covariance

One prob­lem with the covari­ance is that the result­ing val­ue is on the scale of the vari­ables. How can we decide if this is a strong or weak co-vari­a­tion? To get a val­ue that makes it pos­si­ble to com­pare covari­a­tions irre­spec­tive of units, we need to stan­dard­ize the val­ue of the covari­ance:

product moment

where S_x  and  S_y are the stan­dard devi­a­tions of the vari­ables x and y. The val­ue r is called the prod­uct moment cor­re­la­tion coef­fi­cient, which can only be a val­ue between -1 and 1. The clos­er to -1 or 1, the stronger the cor­re­la­tion. Pos­i­tive and neg­a­tive val­ues indi­cate pos­i­tive or neg­a­tive cor­re­la­tions, respec­tive­ly. In the case above, there is a per­fect cor­re­la­tion. But what if we alter one of the val­ues of Shoe size:

The sum is the same but the stan­dard devi­a­tion of y has changed:

product moment2

Now, the cor­re­la­tion is weak­er. You can now take the square of r to cal­cu­late the pro­por­tion of the vari­a­tion in one of the vari­ables that is explained by the vari­a­tion in the oth­er.  That is: 0.652 = 0.42, which is called the coef­fi­cient of deter­mi­na­tion.

How to do it in R

#Example 1

	Height<-c(168,170,172)
	Shoe<-c(38,40,42)

	cor.test(Shoe, Height,method = "pearson")

#Example 2

	Height<-c(168,170,172)
	Shoe<-c(38,36,42)

	cor.test(Shoe, Height,method = "pearson")