IntroductionFor two-variable paired statistical data the closeness of the relationship between the two sets of figures is usually judged by their correlation coefficient. Sometimes though exact numerical values may not be available or may be considered too complicated and instead the data is just arranged in order from best to worst or lowest to highest etc. It is then required to judge how similar the two orderings are. The first step is to assign a sequential number to each of the two orderings, starting at 1 and going up to the number of items, i.e. they are put into rank order. A correlation coefficient between the two sets of ranks is then calculated. Commonly Spearman's Rank Correlation Coefficient is used and this is found by subtracting the rank numbers in one set individually from the corresponding numbers in the other set to form the difference, d. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Spearman's rank correlation coefficient on the hp 39g+ Graphing CalculatorThe paired data must first be entered into columns C1 and C2 of the Statistics aplet. Alternatively a copy of the statistics aplet can be made and the data put into that – useful if the data needs to be kept for later review. It does not matter if the original data has not been converted to rank order, e.g. it could be the raw results from a test. The program will sort the data into numerical order on column C1 and then convert both columns to the equivalent rank order from 1 to n. Then by running program Spearman the data is checked, sorted and converted to rank order, and the Spearman's correlation coefficient calculated and displayed. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Loading SpearmanSpearman is a single program and should be transferred to the hp 39g+ in the usual way, i.e.:
Alternatively it is possible to type the program directly into the calculator from the listing at the end. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Running SpearmanAs a simple example, suppose the marks obtained by the same group of six students in French and German tests were:
We want to know whether there is any significant correlation between the rankings of the marks of these six students in the two tests, for instance is it likely that a student placed near the top in a French test will also be near the top in a German test? To calculate the rank correlation coefficient manually we would perform the following steps:
The working could be set out in a table thus:
Then Spearman's rank correlation coefficient is given by rs = 1 - 6*13.5/6/(6² - 1) = 0.614 To perform the same calculations on the hp 39g+, START the Statistics aplet, or a copy of it, from the calculator's aplet catalog. Next change to the PROGRAM catalog and START program Spearman. This brings up the following warning screen to remind you that the original data will be overwritten: Assuming you pressed Y, the program checks that columns C1 and C2 are the same size and that there are at least two data pairs. If this is not the case an appropriate error message is shown and the program terminates. Next the two columns are converted to rank order numbers, and again the progress is shown: Finally the Spearman's Rank Correlation Coefficient is calculated and displayed thus: Pressing the NUM key returns you to a view of the data, which you will see is sorted and ranked: If there are many 'ties' in the rankings then statistics manuals generally recommend that the (more complicated) formula for the Pearson correlation coefficient is used rather than Spearman's formula, even for ranked data. (If there are no ties then both formulas will give the same answer.) Of course it is possible to apply the other features of the Statistics aplet to the ranked data, such as drawing a scatter plot after setting suitable axes: The scatter diagram seems to show some relationship between the rankings but is the Spearman's correlation coefficient value of 0.614 statistically significant? If the trend were for a high mark in French to be associated with a low mark in German then the calculated correlation coefficient would of course be negative. As with other definitions of correlation coefficient, Spearman's always lies between -1 and +1. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Hardware RequirementsSpearman runs on a Hewlett-Packard 39g+ calculator. It should also work on the 39g, 39gs and 40gs, and may work on the 38g, 48g, 49g and 50g but has not been tried. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Variables UsedSpearman uses the HOME variables I, N, S, X and Y, and will therefore overwrite any existing information stored in them. It also changes the values in columns C1 and C2 of the current Statistics aplet. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Known BugsNone known other than that sorting and ranking can be very slow once the number of data pairs exceeds a few tens. |
Program ListingNote: the character Þ represents the 'STO' arrow. |
Program Spearman | Explanation |
---|---|
Converts statistical data pairs into ranked order and computes Spearman's correlation coefficient | |
ERASE:DISP 1;"This program will": DISP 2;"convert the data in": DISP 3;"C1 & C2 of the current": DISP 4;"statistics aplet to": DISP 5;"rank order.": DISP 7;"OK to proceed Y/N?":¿ GETKEY X:INT(X)ÞX: IF X¹93 AND X¹105 THEN STOP:END: ERASE:¿ SIZE(C1)ÞS:IF S¹SIZE(C2) THEN MSGBOX "C1 & C2 must be the same size!": STOP:END:¿ IF S<2 THEN MSGBOX "At least 2 paired values needed!":STOP:END:¿ S-1ÞN: DO DISP 1;"Sorting "INT(100-(N/SIZE(C1))²*100)"%": 0ÞX:FOR I=1 TO N; IF C1(I)>C1(I+1) THEN C1(I)ÞY:C1(I+1)ÞC1(I): YÞC1(I+1):C2(I)ÞY: C2(I+1)ÞC2(I):YÞC2(I+1): 1ÞX:END:END:N-1ÞN:¿ UNTIL X==0 END: ¿ 1ÞX:¿ DO XÞY: WHILE C1(Y)==C1(X) REPEAT Y+1ÞY: IF Y>S THEN BREAK:END: END: (Y+X-1)/2ÞN: FOR I=X TO Y-1;NÞC1(I):END: YÞX: ¿ UNTIL X>S END: ¿ C2(1)ÞX:FOR I=1 TO S; MIN(X,C2(I))ÞX:END:¿ C2-XÞC2:¿ 1ÞY:¿ DO DISP 3;"Ranking "INT(Y/S*100)"%":MAXREALÞX: FOR I=1 TO S; IF C2(I)³0 AND C2(I)<X THEN C2(I)ÞX:END:END:¿ 0ÞN:FOR I=1 TO S; (C2(I)==X)+NÞN:END:¿ Y+NÞY: (2*Y-N-1)/2ÞN: FOR I=1 TO S;IF C2(I)==X THEN -NÞC2(I):END:END:¿ UNTIL Y>S END:-C2ÞC2:¿ DISP 5;"Spearman's corr coef =": 1-6*SLIST((C1-C2)²)/S/(S²-1)ÞS: DISP 6;S:FREEZE: |
Display a reminder that current data will be overwritten and prompt for confirmation to proceed. If key Y or ENTER is pressed then continue, otherwise exit the program. Clear the display. Check that the two data lists are the same size and contain at least two data points before proceeding. If not display a message box then exit. Perform a standard bubblesort on the data in column C1, moving C2 as well to keep the paired data together. Show a rough progress report and finish when no swaps were made on the last pass. The columns can now be converted to rank numbers. For C1 this is quite easy as it is already in order so we just find how many times the current value occurs consecutively (1 or more times), work out what the mean rank number is, then go back and set all equal values to this mean. Converting the second, unsorted, column to rank order is a little more complicated. First the smallest value is found and subtracted from each list element, so they are all greater than or equal to zero. Then a loop is begun where first the smallest non-negative value is found. Next the number of times this smallest value occurs is counted. The mean rank number which must be assigned to all these tied values is calculated from the previous highest rank used and the number of ties, and the negative of this is assigned to each list element equal to the smallest value previously found. The reason for initially assigning negative ranks is to distinguish unprocessed numbers from those which have already been ranked. Finally the loop terminates and C2 is replaced by minus C2 so the ranks become positive. Spearman's rank correlation coefficient is calculated via the standard formula, displayed and stored in variable S. |