Variables and scale

In statistics you could get your data by measuring stuff directly (such as height), collecting information using questionnaires, as a result from a process, or by any other means. No matter which type of data you have, you are dealing with one or several variables that are on specific scales. You need to know which variables are dependent and independent and the scales they are measured on in order to describe the data and select an appropriate statistical test.

Dependent and independent variables

A variable contains values that vary between the units within a population or sample. A variable could be length, salary, the number of people with headaches, proportion of females, speed or satisfaction of clients. I’m used to call each value you collect of a variable for an observation. It is important to distinguish between dependent and independent variables. The dependent variable is actually determined by the independent variable. They could be of the same scale, as in regression, or of different scales as in the t-test or ANOVA. The dependent variable salary (continuous scale) may for example be determined by the independent variable gender (discrete scale). But, if you don’t find a difference in salary between genders, the salary is not determined by gender but perhaps some other variable that was not measured. It could for example be age. So what you do in a statistical test is that you find out if the independent variable actually determines the dependent (however check out the section about collinearity).

Scales

The dependent variable can be of either a continuous or discrete scale. Continuous scales consist of values without disruptions and incorporate decimals, such as lengths, weights, salaries and speed. The intervals between the values on the scale are exactly the same. For example the interval between 344 and 345 is exactly the same as the interval between 346 and 345. Discrete scales on the other hand are disrupted and do not have to involve numbers other than to represent a color (nominal scale) or the degree of satisfaction of clients (ordinal scale). The values on the nominal scale have no internal order; red can just as well come before blue. The colors can also be assigned values that represent the categories such as 1 for red and 2 for blue. These values can be arranged in any order (1, 3, 2, 4 just as well as 1, 2, 3, 4). The ordinal scale on the other hand has an internal order between the values; for example not satisfied, satisfied and very satisfied. However the gap on the scale between satisfied and very satisfied could be smaller than compared to not satisfied and satisfied.  That is, the interval between the values on the scale is not exactly the same. As I said, the order is important as compared to the nominal scale, but it does not matter how you order it. You can as well order it from the lowest to the highest (1, 2, 3, 4) or vice verca (4, 3, 2, 1). You just have to keep track on what it represents. The independent variable rather than the dependent variable is on the nominal scale (but see logistic regression). The independent is, however, often on a continuous scale (see regression). And what about counts of things? Counts can’t have decimals as continuous variables, it is a non-integer number. On the other hand, the intervals between the values on the scale of count of things are exactly the same as is not the case with discrete variables on the ordinal scale. I’d like to put it somewhere in between.

Most important from this article:

(1) Variables are observations on a specific scale of measurement which value varies among the units within a population or sample

(2) The dependent variable is determined by the independent variable

(3) Variables are either on a continuous or discrete scale