## Population vs sample

*Population *and *sample *are two fundamental concepts of statistical theory. In every statistical test, you deal with at least one population and an associated sample.

Before even thinking of collecting data you need to define the population(s) involved in the test. **A population is the group you want to make generalizations about.** You want to make some sort of statement about this group, such as: “men are on average x meters in length”. When performing a statistical test you often want to see if there is a difference between two potential populations, such as: “men are on average taller compared to women”. But, if no difference is detected by the test, there is a high probability that heights of all men and women belong to the same population. You also need to decide whether it is all men in the world or in for example Sweden you want to make generalizations about. That is, you need to be sure about which group you actually make generalizations about. How specific you should be is determined by the purpose of your study.

It is in practice impossible to gather information about the heights of all men in the world or even in Sweden. Therefore you need to collect a subset of all the lengths in the population. This is called a sample. **The sample is a random subset that represents the population.** The sample needs to be random; otherwise it is not really representing the population. Let’s say you are interesting in describing the length of all men in the world. Besides statistics you are also very interested in basket ball, playing internationally. To save time you ask the members of the other team about their lengths, which you use in the study. The problem about this study is that the lengths are not randomly drawn from the population of all men in the world. The lengths in fact represent the ones of professional basket ball players.

When you are working with a data set, it is important that you know if you are dealing with a population or a sample. In most cases the data is from a sample, but sometimes it is actually possible to collect data from the entire population. The equations used to describe a population differ depending on if you have observations from all the units in the population or from a random sample.

Most important from this section:

Be sure to define the population that you want to make generalizations about.

The sample of a population needs to be representative, which means it has to be randomly drawn from the population.

Be sure that you know whether your data is from the entire population or a sample.