The poisson distribution

The pois­son dis­tri­b­u­tion has many appli­ca­tions. But, per­son­al­ly I have only had use of it in my research deal­ing with the gen­er­al­ized lin­ear mod­els. These use max­i­mum like­li­hood meth­ods to com­pare the out­come of a mod­el based on a spe­cif­ic dis­tri­b­u­tion, such as for exam­ple the pois­son dis­tri­b­u­tion, with the data you have obtained.  You can also use the dis­tri­b­u­tion to show that your data is ran­dom­ly dis­trib­uted in space and as you’ll see; to cal­cu­late the prob­a­bil­i­ty of mak­ing a spe­cif­ic num­ber of goals in a soc­cer match.

The pois­son dis­tri­b­u­tion describes the num­ber of a rare ran­dom event with­in a spec­i­fied time peri­od, as for exam­ple the num­ber of soc­cer goals dur­ing a soc­cer match that lasts for about 90 min­utes. You are more often expect­ing zero goals dur­ing a soc­cer match than say five goals. So when record­ing the num­ber of goals in match­es, the dis­tri­b­u­tions of those goals will tend to be skewed to the left. There are more fre­quent­ly a low num­ber of goals than a high num­ber of goals. The high­est num­ber of goals in soc­cer match ever record­ed is 149 goals! But this is a very rare, so you can prob­a­bly fig­ure out that the prob­a­bil­i­ty for that score is extreme­ly low.

So the pois­sion dis­tri­b­u­tion deals with counts of things where a high num­ber of a spe­cif­ic event is rar­er than a low num­ber. Anoth­er impor­tant prop­er­ty is that the mean and vari­ance in a pois­son dis­tri­b­u­tion are equal. More­over, the pois­son dis­tri­b­u­tion is skewed to the left. Note the tail in the right end of the dis­tri­b­u­tion:

The prob­a­bil­i­ty of a spe­cif­ic num­ber of event to occur; that you will make a spe­cif­ic num­ber of goals in a sin­gle soc­cer game is cal­cu­lat­ed with the fol­low­ing equa­tion:The poission distribution equation

where k is the num­ber of a spe­cif­ic event (i.e. 10 soc­cer goals),  \lambda is the mean num­ber of a spe­cif­ic event, and k! is the fac­to­r­i­al of k.

Exam­ple 1

Cal­cu­late the prob­a­bil­i­ty of expe­ri­enc­ing 5 goals in a soc­cer match when the aver­age num­ber of goals in a match is equal to 2.

That means that  k= 5 and  \lambda= 2

Then the prob­a­bil­i­ty of 5 goals in a soc­cer game is:

Poisson example

How to do it in R

dpois(5,2)

Exam­ple 2

Cal­cu­late the prob­a­bil­i­ty of mak­ing more than 2 goals in a soc­cer game where the aver­age num­ber of goals in a game is 2.

What you have to here is to bring out your skills in prob­a­bil­i­ty the­o­ry. First use the equa­tion to cal­cu­late the entire prob­a­bil­i­ty dis­tri­b­u­tion; i.e. the prob­a­bil­i­ty for 0, 1, 2,…. ∞ goals. That is:

P (0;2) = 0.14,

P (1;2) = 0.27

P (2;2) = 0.27

.

.

.

P (∞;2) = 0

You’ll notice that the prob­a­bil­i­ties become real­ly small over about 5 goals and reach 0 already at 21 goals. Ok, now you have the dis­tri­b­u­tion. What you want to do is to cal­cu­late the prob­a­bil­i­ty for mak­ing more than 2 goals in a match. That means you have to sum the prob­a­bil­i­ties for the num­ber of goals that exceeds 2. No prob­lem. Since the prob­a­bil­i­ties get very small quick­ly, you don’t have to sum for infin­i­ty. Even do you nev­er reach a prob­a­bil­i­ty of zero as we did here, you can sum the prob­a­bil­i­ties until reach­ing very small val­ues (which you do quick­ly). So, what do we get? The sum of the prob­a­bil­i­ties of mak­ing more than two goals in a sin­gle soc­cer match is 0.32.

How to do it in R

#Setting the vector of all possible goals

	g<-0:200

#Calculating the probability of each goal where k = nr of goals and lambda = 2

	p<-dpois(g,2)

#Calculating the probability for making more than 2 goals in a game

	d<-data.frame(Goals=g,Prob=p) 	sum(d$Prob[which(d$Goals>2)])

Important to remember

(1) The pois­son dis­tri­b­u­tion describes the num­ber of a rare ran­dom event with­in a spec­i­fied time peri­od. The dis­tri­b­u­tion is there­fore skewed to the left.

(2) The pois­son dis­tri­b­u­tion deals with counts of things.

(3) Make sure you under­stand the prob­a­bil­i­ty the­o­ry sec­tion in order to under­stand how to cal­cu­late prob­a­bil­i­ties using any dis­tri­b­u­tion.

How to produce the graph in this article in R

 


#Setting the vector of all possible goals

	g<-0:10

#Calculating the probability of each goal where k = nr of goals and lambda = 2

	p<-dpois(g,2)

#Plotting
	
	barplot(p,names=g,col = "#C20000", main = "Poission distribution",
    			xlab = "Nr of goals", ylab="Probability",bty="l",las=1,cex.lab=1.2)

	abline(h=0)