Spearman's Rank Correlation Coefficient for the hp 39g+ graphical calculator

Introduction

For two-variable paired statistical data the closeness of the relationship between the two sets of figures is usually judged by their correlation coefficient.
For data with precise numerical values, Pearson's correlation coefficient can be calculated and the hp 39g+ calculator will perform this calculation by selecting 2VAR from the statistics aplet menu, then the STAT button, and scrolling down to the value of CORR.

Sometimes though exact numerical values may not be available or may be considered too complicated and instead the data is just arranged in order from best to worst or lowest to highest etc. It is then required to judge how similar the two orderings are.

The first step is to assign a sequential number to each of the two orderings, starting at 1 and going up to the number of items, i.e. they are put into rank order. A correlation coefficient between the two sets of ranks is then calculated.

Commonly Spearman's Rank Correlation Coefficient is used and this is found by subtracting the rank numbers in one set individually from the corresponding numbers in the other set to form the difference, d.
Spearman's correlation coefficient is then calculated as:
rs = 1 - 6*Sd²/n/(n² - 1)
where Sd² is the sum of the squares of the differences between the rankings and n is the number of data pairs.
See later for an example or a statistics textbook for more details.


Spearman's rank correlation coefficient on the hp 39g+ Graphing Calculator

The paired data must first be entered into columns C1 and C2 of the Statistics aplet. Alternatively a copy of the statistics aplet can be made and the data put into that – useful if the data needs to be kept for later review.
Note that only columns C1 and C2 may be used and that there must be at least two pairs of data items, also that C1 and C2 must have the same number of elements so there is no unpaired data.

It does not matter if the original data has not been converted to rank order, e.g. it could be the raw results from a test. The program will sort the data into numerical order on column C1 and then convert both columns to the equivalent rank order from 1 to n.
Notice that this will overwrite the original data so if this needs to be kept it should either be copied to two other columns (e.g. using C1 sto C8 & C2 sto C9 from the home screen), or a copy made of the Statistics aplet containing the data, and this copy started.

Then by running program Spearman the data is checked, sorted and converted to rank order, and the Spearman's correlation coefficient calculated and displayed.


Loading Spearman

Spearman is a single program and should be transferred to the hp 39g+ in the usual way, i.e.:

  • Put the files HP39DIR.000, HP39DIR.CUR and SPEARMAN.000 from this zip file into an empty directory on a PC. (The .htm and .gif files can be left in - they will be ignored by the calculator.)
  • Connect the calculator to the computer via the USB cable then start the HP39G Connectivity program and point it to the directory holding the files.
  • Go to the PROGRAM catalog on the calculator, choose RECV and transfer Spearman from the computer to the hp 39g+'s memory by selecting Spearman as the file to download.

Alternatively it is possible to type the program directly into the calculator from the listing at the end.


Running Spearman

As a simple example, suppose the marks obtained by the same group of six students in French and German tests were:
StudentFrench MarkGerman Mark
A126
B85
C167
D127
E74
F108

We want to know whether there is any significant correlation between the rankings of the marks of these six students in the two tests, for instance is it likely that a student placed near the top in a French test will also be near the top in a German test?
Notice that there are two 'ties' in the test scores. The normal procedure for dealing with equal rank positions is to give the tied data the mean of the rankings they would otherwise have. E.g. if two students were ranked as equal third, they would both be given the rank of 3.5 = (3+4)/2, and the next student would be assigned a rank of 5.

To calculate the rank correlation coefficient manually we would perform the following steps:

  1. Sort the first set of data into numerical order, keeping the paired data together. This is not essential but makes it easier to 'see' the level of agreement.
  2. Convert the raw numbers into rank order, from 1 to 6 in this example, applying the averaging rule for any tied values.
  3. Subtract the rank positions in the second column (German) from those in the first column (French).
  4. Square these differences, so that they all become positive.
  5. Add up the squared differences.
  6. Apply the Spearman's formula to this total, for n = 6 items.

The working could be set out in a table thus:
Student (ordered by French mark)French MarkGerman MarkFrench RankGerman RankDifference d in ranks
E741100
B852200
F10836-39
A1264.531.52.25
D1274.54.500
C16764.51.52.25
    TotalS13.5

Then Spearman's rank correlation coefficient is given by
rs = 1 - 6*13.5/6/(6² - 1) = 0.614


To perform the same calculations on the hp 39g+, START the Statistics aplet, or a copy of it, from the calculator's aplet catalog.

Choose the NUM view and enter the pairs of test marks into columns C1 and C2.
When this is done the display should show:

Raw data entered into Statistics aplet

Next change to the PROGRAM catalog and START program Spearman. This brings up the following warning screen to remind you that the original data will be overwritten:
Confirmation to proceed
Press Y or ENTER to proceed, or any other key to cancel the program.

Assuming you pressed Y, the program checks that columns C1 and C2 are the same size and that there are at least two data pairs. If this is not the case an appropriate error message is shown and the program terminates.

Otherwise the data is sorted into numerical order by column C1 and a progress screen is shown:
Sorting progress
The percentage complete is only an estimate and may not reach 100%, depending on how disordered the original list is.

Next the two columns are converted to rank order numbers, and again the progress is shown:
Rank ordering progress
For a small number of data points the sorting and ranking processes take just a few seconds, but unfortunately will take several minutes for longer data lists.

Finally the Spearman's Rank Correlation Coefficient is calculated and displayed thus:
Correlation coefficient display
Spearman's rank correlation coefficient = 0.614 to 3 figures.
For further use the value of the correlation coefficient is also stored in variable S (for Spearman).

Pressing the NUM key returns you to a view of the data, which you will see is sorted and ranked:
Sorted and ranked data

If there are many 'ties' in the rankings then statistics manuals generally recommend that the (more complicated) formula for the Pearson correlation coefficient is used rather than Spearman's formula, even for ranked data. (If there are no ties then both formulas will give the same answer.)

By making sure that STAT2 is selected and pressing the STATS function key, the calculator will compute Pearson's correlation coefficient for the ranked data:
Pearson's correlation coefficient
Pearson's correlation coefficient = 0.603

It can be seen that even in the present example, with ties, the two correlation coefficients are very similar. (For interest, calculating the Pearson coefficient on the raw data before converting to rank order gives a value of 0.634)

Of course it is possible to apply the other features of the Statistics aplet to the ranked data, such as drawing a scatter plot after setting suitable axes:
Scatter plot of ranked data

The scatter diagram seems to show some relationship between the rankings but is the Spearman's correlation coefficient value of 0.614 statistically significant?

There are tables of critical values of the correlation coefficient (i.e. the minimum value which is significant) published in statistical handbooks and a graph of these critical values for up to 100 data pairs is shown here:

Graph of Critical Correlation Values

It can be read off that for six data pairs the minimum value of rs at 90% confidence is approx 0.65 (the exact value is 0.657) – since our value is a little less than this there is a more than 10% probability that the apparent correlation between the two sets of test marks is just due to chance and hence we cannot be particularly sure that a good score in French will generally mean a good score in German.

If the trend were for a high mark in French to be associated with a low mark in German then the calculated correlation coefficient would of course be negative. As with other definitions of correlation coefficient, Spearman's always lies between -1 and +1.


Hardware Requirements

Spearman runs on a Hewlett-Packard 39g+ calculator. It should also work on the 39g, 39gs and 40gs, and may work on the 38g, 48g, 49g and 50g but has not been tried.
The program takes up 3.7 kilobytes of RAM.


Variables Used

Spearman uses the HOME variables I, N, S, X and Y, and will therefore overwrite any existing information stored in them. It also changes the values in columns C1 and C2 of the current Statistics aplet.


Known Bugs

None known other than that sorting and ranking can be very slow once the number of data pairs exceeds a few tens.




Program Listing

Note: the character Þ represents the 'STO' arrow.
True line breaks in the program are shown by ¿, other line breaks are just to fit the listing into the table more neatly.


Program SpearmanExplanation
Converts statistical data pairs into ranked order and computes Spearman's correlation coefficient
ERASE:DISP 1;"This program will":
DISP 2;"convert the data in":
DISP 3;"C1 & C2 of the current":
DISP 4;"statistics aplet to":
DISP 5;"rank order.":
DISP 7;"OK to proceed Y/N?":¿
GETKEY X:INT(X)ÞX:
IF X¹93 AND X¹105 THEN STOP:END:
ERASE:¿
SIZE(C1)ÞS:IF S¹SIZE(C2) THEN
MSGBOX "C1 & C2 must be the same size!":
STOP:END:¿
IF S<2 THEN MSGBOX "At least 2 paired values needed!":STOP:END:¿
S-1ÞN:
DO DISP 1;"Sorting "INT(100-(N/SIZE(C1))²*100)"%":
0ÞX:FOR I=1 TO N;
IF C1(I)>C1(I+1)
THEN C1(I)ÞY:C1(I+1)ÞC1(I):
YÞC1(I+1):C2(I)ÞY:
C2(I+1)ÞC2(I):YÞC2(I+1):
1ÞX:END:END:N-1ÞN:¿
UNTIL X==0 END: ¿
1ÞX:¿
DO XÞY:
WHILE C1(Y)==C1(X) REPEAT Y+1ÞY:
IF Y>S THEN BREAK:END:
END:
(Y+X-1)/2ÞN:
FOR I=X TO Y-1;NÞC1(I):END:
YÞX: ¿
UNTIL X>S END: ¿
C2(1)ÞX:FOR I=1 TO S;
MIN(X,C2(I))ÞX:END:¿
C2-XÞC2:¿
1ÞY:¿
DO DISP 3;"Ranking "INT(Y/S*100)"%":MAXREALÞX:
FOR I=1 TO S;
IF C2(I)³0 AND C2(I)<X
THEN C2(I)ÞX:END:END:¿
0ÞN:FOR I=1 TO S;
(C2(I)==X)+NÞN:END:¿
Y+NÞY: (2*Y-N-1)/2ÞN:
FOR I=1 TO S;IF C2(I)==X
THEN -NÞC2(I):END:END:¿
UNTIL Y>S END:-C2ÞC2:¿
DISP 5;"Spearman's corr coef =":
1-6*SLIST((C1-C2)²)/S/(S²-1)ÞS:
DISP 6;S:FREEZE:
Display a reminder that current data will be overwritten and prompt for confirmation to proceed.
If key Y or ENTER is pressed then continue, otherwise exit the program.
Clear the display.
Check that the two data lists are the same size and contain at least two data points before proceeding.
If not display a message box then exit.
Perform a standard bubblesort on the data in column C1, moving C2 as well to keep the paired data together. Show a rough progress report and finish when no swaps were made on the last pass.
The columns can now be converted to rank numbers.
For C1 this is quite easy as it is already in order so we just find how many times the current value occurs consecutively (1 or more times), work out what the mean rank number is, then go back and set all equal values to this mean.
Converting the second, unsorted, column to rank order is a little more complicated.
First the smallest value is found and subtracted from each list element, so they are all greater than or equal to zero.
Then a loop is begun where first the smallest non-negative value is found.
Next the number of times this smallest value occurs is counted.
The mean rank number which must be assigned to all these tied values is calculated from the previous highest rank used and the number of ties, and the negative of this is assigned to each list element equal to the smallest value previously found.
The reason for initially assigning negative ranks is to distinguish unprocessed numbers from those which have already been ranked.
Finally the loop terminates and C2 is replaced by minus C2 so the ranks become positive.
Spearman's rank correlation coefficient is calculated via the standard formula, displayed and stored in variable S.