Variables and scale

In sta­tis­tics you could get your data by mea­sur­ing stuff direct­ly (such as height), col­lect­ing infor­ma­tion using ques­tion­naires, as a result from a process, or by any oth­er means. No mat­ter which type of data you have, you are deal­ing with one or sev­er­al vari­ables that are on spe­cif­ic scales. You need to know which vari­ables are depen­dent and inde­pen­dent and the scales they are mea­sured on in order to describe the data and select an appro­pri­ate sta­tis­ti­cal test.

Dependent and independent variables

A vari­able con­tains val­ues that vary between the units with­in a pop­u­la­tion or sam­ple. A vari­able could be length, salary, the num­ber of peo­ple with headaches, pro­por­tion of females, speed or sat­is­fac­tion of clients. I’m used to call each val­ue you col­lect of a vari­able for an obser­va­tion. It is impor­tant to dis­tin­guish between depen­dent and inde­pen­dent vari­ables. The depen­dent vari­able is actu­al­ly deter­mined by the inde­pen­dent vari­able. They could be of the same scale, as in regres­sion, or of dif­fer­ent scales as in the t-test or ANOVA. The depen­dent vari­able salary (con­tin­u­ous scale) may for exam­ple be deter­mined by the inde­pen­dent vari­able gen­der (dis­crete scale). But, if you don’t find a dif­fer­ence in salary between gen­ders, the salary is not deter­mined by gen­der but per­haps some oth­er vari­able that was not mea­sured. It could for exam­ple be age. So what you do in a sta­tis­ti­cal test is that you find out if the inde­pen­dent vari­able actu­al­ly deter­mines the depen­dent (how­ev­er check out the sec­tion about collinear­i­ty).

Scales

The depen­dent vari­able can be of either a con­tin­u­ous or dis­crete scale. Con­tin­u­ous scales con­sist of val­ues with­out dis­rup­tions and incor­po­rate dec­i­mals, such as lengths, weights, salaries and speed. The inter­vals between the val­ues on the scale are exact­ly the same. For exam­ple the inter­val between 344 and 345 is exact­ly the same as the inter­val between 346 and 345. Dis­crete scales on the oth­er hand are dis­rupt­ed and do not have to involve num­bers oth­er than to rep­re­sent a col­or (nom­i­nal scale) or the degree of sat­is­fac­tion of clients (ordi­nal scale). The val­ues on the nom­i­nal scale have no inter­nal order; red can just as well come before blue. The col­ors can also be assigned val­ues that rep­re­sent the cat­e­gories such as 1 for red and 2 for blue. These val­ues can be arranged in any order (1, 3, 2, 4 just as well as 1, 2, 3, 4). The ordi­nal scale on the oth­er hand has an inter­nal order between the val­ues; for exam­ple not sat­is­fied, sat­is­fied and very sat­is­fied. How­ev­er the gap on the scale between sat­is­fied and very sat­is­fied could be small­er than com­pared to not sat­is­fied and sat­is­fied.  That is, the inter­val between the val­ues on the scale is not exact­ly the same. As I said, the order is impor­tant as com­pared to the nom­i­nal scale, but it does not mat­ter how you order it. You can as well order it from the low­est to the high­est (1, 2, 3, 4) or vice ver­ca (4, 3, 2, 1). You just have to keep track on what it rep­re­sents. The inde­pen­dent vari­able rather than the depen­dent vari­able is on the nom­i­nal scale (but see logis­tic regres­sion). The inde­pen­dent is, how­ev­er, often on a con­tin­u­ous scale (see regres­sion). And what about counts of things? Counts can’t have dec­i­mals as con­tin­u­ous vari­ables, it is a non-inte­ger num­ber. On the oth­er hand, the inter­vals between the val­ues on the scale of count of things are exact­ly the same as is not the case with dis­crete vari­ables on the ordi­nal scale. I’d like to put it some­where in between.

Most important from this article:

(1) Vari­ables are obser­va­tions on a spe­cif­ic scale of mea­sure­ment which val­ue varies among the units with­in a pop­u­la­tion or sam­ple

(2) The depen­dent vari­able is deter­mined by the inde­pen­dent vari­able

(3) Vari­ables are either on a con­tin­u­ous or dis­crete scale