The product Moment Correlation coefficient

The product moment correlation coefficient is a measure of the coreelation between two variables, for example Height and Shoe size. We will start by explaining covariance.

Covariance

I will use two variables, Height and Shoe size, for the demonstration of what covariance really is. Say that we have the following observations of Height and Shoe size:

Pearson

Now, let’s plot these points:

covariance

The dotted lines illustrates the deviation from the points to the means of Heigth (y) and Shoe size (x).The next step is to see how much each point deviates from the means. Each point has both a y and x coordinate, which are multiplied to get the Product. You can have three different types of outcomes when summing the products:

  1. Zero; there is no correlation
  2. Positive value; there is a positive correlation
  3. Negative value; there is a negative correlation
pearson2

In this example, we have positive correlation.

To get the covariance, the average product of the coordinates of a point; divide with the degrees of freedom . In this example the covariance is:

covariance

One problem with the covariance is that the resulting value is on the scale of the variables. How can we decide if this is a strong or weak co-variation? To get a value that makes it possible to compare covariations irrespective of units, we need to standardize the value of the covariance:

product moment

where S_x  and  S_y are the standard deviations of the variables x and y. The value r is called the product moment correlation coefficient, which can only be a value between -1 and 1. The closer to -1 or 1, the stronger the correlation. Positive and negative values indicate positive or negative correlations, respectively. In the case above, there is a perfect correlation. But what if we alter one of the values of Shoe size:

pearson3

The sum is the same but the standard deviation of y has changed:

product moment2

Now, the correlation is weaker. You can now take the square of r to calculate the proportion of the variation in one of the variables that is explained by the variation in the other.  That is: 0.652 = 0.42, which is called the coefficient of determination.

How to do it in R

#Example 1

	Height<-c(168,170,172)
	Shoe<-c(38,40,42)

	cor.test(Shoe, Height,method = "pearson")

#Example 2

	Height<-c(168,170,172)
	Shoe<-c(38,36,42)

	cor.test(Shoe, Height,method = "pearson")