## The poisson distribution

The poisson distribution has many applications. But, personally I have only had use of it in my research dealing with the **generalized linear models**. These use maximum likelihood methods to compare the outcome of a model based on a specific distribution, such as for example the poisson distribution, with the data you have obtained. You can also use the distribution to show that your data is randomly distributed in space and as you’ll see; to calculate the probability of making a specific number of goals in a soccer match.

**The poisson distribution describes the number of a rare random event within a specified time period**, as for example the number of soccer goals during a soccer match that lasts for about 90 minutes. You are more often expecting zero goals during a soccer match than say five goals. So when recording the number of goals in matches, the distributions of those goals will tend to be skewed to the left. There are more frequently a low number of goals than a high number of goals. The highest number of goals in soccer match ever recorded is 149 goals! But this is a very rare, so you can probably figure out that the probability for that score is extremely low.

**So the poission distribution deals with counts of things **where a high number of a specific event is rarer than a low number.** Another important property is that the mean and variance in a poisson distribution are equal. **Moreover, the poisson distribution is skewed to the left. Note the tail in the right end of the distribution:

The probability of a specific number of event to occur; that you will make a specific number of goals in a single soccer game is calculated with the following equation:

where is the number of a specific event (i.e. 10 soccer goals), is the mean number of a specific event, and is the factorial of .

**Example 1**

Calculate the probability of experiencing 5 goals in a soccer match when the average number of goals in a match is equal to 2.

That means that = 5 and = 2

Then the probability of 5 goals in a soccer game is:

*How to do it in R*

dpois(5,2)

**Example 2**

Calculate the probability of making more than 2 goals in a soccer game where the average number of goals in a game is 2.

What you have to here is to bring out your skills in probability theory. First use the equation to calculate the entire probability distribution; i.e. the probability for 0, 1, 2,…. ∞ goals. That is:

*P *(0;2) = 0.14,

*P *(1;2) = 0.27

*P *(2;2) = 0.27

.

.

.

*P *(∞;2) = 0

You’ll notice that the probabilities become really small over about 5 goals and reach 0 already at 21 goals. Ok, now you have the distribution. What you want to do is to calculate the probability for making more than 2 goals in a match. That means you have to sum the probabilities for the number of goals that exceeds 2. No problem. Since the probabilities get very small quickly, you don’t have to sum for infinity. Even do you never reach a probability of zero as we did here, you can sum the probabilities until reaching very small values (which you do quickly). So, what do we get? The sum of the probabilities of making more than two goals in a single soccer match is 0.32.

*How to do it in R*

#Setting the vector of all possible goals g<-0:200 #Calculating the probability of each goal where k = nr of goals and lambda = 2 p<-dpois(g,2) #Calculating the probability for making more than 2 goals in a game d<-data.frame(Goals=g,Prob=p) sum(d$Prob[which(d$Goals>2)])

**Important to remember**

(1) The poisson distribution describes the number of a rare random event within a specified time period. The distribution is therefore skewed to the left.

(2) The poisson distribution deals with counts of things.

(3) Make sure you understand the probability theory section in order to understand how to calculate probabilities using any distribution.

*How to produce the graph in this article in R*

#Setting the vector of all possible goals g<-0:10 #Calculating the probability of each goal where k = nr of goals and lambda = 2 p<-dpois(g,2) #Plotting barplot(p,names=g,col = "#C20000", main = "Poission distribution", xlab = "Nr of goals", ylab="Probability",bty="l",las=1,cex.lab=1.2) abline(h=0)