ANOVA Calculator for the hp 39g+

Introduction

ANOVA is an abbreviation for ANalysis Of VAriance. It is a statistical technique for comparing the means of several sets of data to decide whether the differences between the means are significant or whether they are just random sampling variations.

The basic idea is that you have several sets or groups of measurements and compare the variation within individual measurements within groups to the variation between the means of the groups. If the ratio of between groups variation divided by within groups variation is greater than a critical 'F' value then the conclusion, at a given confidence level, is that at least one of the means is genuinely different from the rest.
If the ratio is smaller than the critical 'F' value the conclusion is that there is no evidence that the means are different.

The arithmetic involved in ANOVA testing is quite simple but there is a lot of it and it is easy to make mistakes when doing the calculations manually, hence the utility of an automated program.



A Worked Example of One-Way Anova

(See a statistical textbook for a full explanation of the principles of ANOVA.)

Three different analytical laboratories determine the fat content of 'Turkey Twirls', each performing four replicate determinations. Their results are:
Laboratory A
45.4%
46.1%
45.8%
45.2%
mean = 45.63%
Laboratory B
47.0%
46.8%
47.3%
47.2%
mean = 47.08%
Laboratory C
46.0%
46.2%
45.9%
46.3%
mean = 46.10%
Clearly the means are slightly different but is this just the normal slight variation you would expect between different laboratories, or is one of them (probably Laboratory B) producing consistently different results to the others?
Our 'null hypothesis', H0, for this investigation is that there is no bias in any of the laboratories' results and that the true mean results from each, if they were to carry out a large number of replicates, would be the same.

There are three groups, each containing four values, making 12 values in all.

First we calculate the 'Total Sum of Squares' as
Square all the individual values and add them together. Add all the individual values together, square the result, and divide by the number of values. Finally subtract the second number from the first. I.e.
(45.42+46.12+45.82...) - (45.4+46.1+45.8+...)2/12 = 5.1067

Next calculate the 'Between Groups Sum of Squares' as
Add up the individual values for one laboratory, square the total and divide by the number of values per laboratory. Add the results for the three laboratories together. Add all the individual values together, square the result, and divide by the number of values. Finally subtract the second total from the first total. I.e.
((45.4+46.1+45.8+45.2)2/4 + (47.0+...)2/4 + (46.0+...)2/4) - (45.4+46.1+45.8+...)2/12 = 4.372

Calculate the 'Within Groups Sum of Squares' by subtracting the Between Sum of Squares from the Total Sum of Squares
5.1067 - 4.372 = 0.735

The 'Between Groups Degrees of Freedom' is one less than the number of groups, or 2 in this example.
The 'Within Groups Degrees of Freedom' is the total number of values minus 1 minus the Between Degrees of Freedom, or 12 - 1 - 2 = 9 in this case.

The 'Mean Square Between Groups' is the Between Sum of Squares divided by the Between Degrees of Freedom, or 4.372/2 = 2.186.
The 'Mean Square Within Groups' is the Within Sum of Squares divided by the Within Degrees of Freedom, or 0.735/9 = 0.0817.

The 'F-value' for the data is the Mean Square Between divided by the Mean Square Within, or 2.186/0.0817 = 26.8.
We then find, from tables of the F-distribution or by using the UTPF() function of the hp39g, the probability that this F-value would occur purely by chance if the null hypothesis were correct, using 2 degrees of freedom for the numerator and 9 for the denominator.
UTPF(2,9,26.8) = 1.620x10-4.
This can be converted to a percentage probability that the null hypothesis is false by multiplying it by 100 and subtracting from 100.
100 - (1.620x10-4 x 100) = 99.984%

If this probability is greater than 95% we can be reasonably confident that there is some genuine bias in one or more of the laboratories' results, and so we reject the null hypothesis. Since it is actually greater than 99.9% we can be virtually certain that the three laboratories are not all producing the same results.
Putting this data into the ANOVA program gives the following output:

Results from worked example
Results from the worked example, agreeing (apart from rounding errors) with our calculation above.



ANOVA on the hp 39g+ Graphing Calculator

This program does all the calculations for you, given a list of the data values, and presents the results in a reasonably understandable format.
It only performs 'one-way ANOVA', i.e. only looking at a single variable between groups.
There is no limit to the number of groups or the number of items per group (except for the memory capacity of the calculator), and unlike other ANOVA programs there is no need for all groups to contain the same number of items.


Loading ANOVA

Copy the files ANOVA000.000, HP39DIR.000 and HP39DIR.CUR to an empty directory on a PC, start the HP39G Connectivity program and point it to the directory holding the three files.
The ANOVA program consists of a single file, ANOVA, which should be downloaded from the computer into the PROGRAM catalogue part of the hp 39g+'s memory using the RECV option on the calculator.


Entering Data

Data needs to be entered before running the program.
Data is entered into List 1 (L1) as a list of lists. That is, the values in each group are entered as a list enclosed in { } and the group data are elements of a larger list, also enclosed in { } and with each sublist separated by a comma.
For example if your data consisted of:
Group 1 values     1, 2, 3
Group 2 values     4, 5, 6
you could enter them in the HOME screen using:
{{1,2,3},{4,5,6}} STO L1
as shown in the screenshot.

Entering data in the HOME screen
Entering a list of lists in the HOME screen

This method though is a bit cumbersome when there are several groups and many values per group. In this case it is better to use the List Editor.
Press SHIFT LIST for the list catalogue, highlight L1 and press EDIT.
Each row in the list view represents one group's data, the individual items of which must be enclosed in { } and separated by commas.
For example, if the data consists of:
Group 1 values    23.2, 24.7, 23.8
Group 2 values    22.8, 22.5, 22.3
Group 3 values    22.7, 22.2, 22.6
you would enter element 1 of L1 as {23.2,24.7,23.8}, element 2 as {22.8,22.5,22.3} and element 3 as {22.7,22.2,22.6}.
If there are too many values in a group to view on one line of the display you can press EDIT on the highlighted row after entering it to scroll along and edit it if necessary.

Entering data in the List Editor
Entering group items in the List Editor view


Running ANOVA

Once all the data has been entered into L1, run ANOVA from the PROGRAM catalogue or the HOME screen.
After a few seconds it should show a display similar to this:

ANOVA results - different means
ANOVA Results display - means are different.

(The display will be 'frozen' at this point; press any key to remove the results and return to the previous display.)
In this case the means of the data sets are not all the same at the 95% confidence level. What the results screen shows is:

  • Means differ at 95% - At the 95% confidence level at least one of the means is different to the others.
  • F   =... - The F value calculated for the data.
  • Prob=... - The probability, on a scale from 0 to 1, that this F-value is just due to random variations if the null hypothesis were in fact true.
  • (Chance H0 is false =... - The probability that H0, the 'null hypothesis', is not true, i.e. how certain you can be that the means are not all the same. Note that this is given as a percentage since it is effectively the confidence level of the result.

If the calculated F-value is less than the critical F-value at the 95% confidence level then it is not proven that the means are different and so they are assumed to be the same. A slightly different results display is shown:

ANOVA results - same means
ANOVA Results display - means are assumed to be the same.

  • Means same at 95% - At the 95% confidence level there is no statistically significant difference between any of the means.
  • F   =... - The F value calculated for the data.
  • Prob=... - The probability, on a scale from 0 to 1, that this F-value is just due to random variations if the null hypothesis is true. In this case it will be more than 0.05
  • (Chance H0 is false =... - The probability that H0, the null hypothesis that all the means are the same, is not true, given as a percentage. This is how certain you can be that there is a genuine difference between some of the means, and will be less than 95%, and therefore not sufficient to be conclusive.

On returning to the HOME screen the following variables contain possibly useful values:

  • F = the calculated F-value
  • P = the probability from 0 to 1 that this is F-value would occur by chance if all the means were identical
  • C = the confidence level that there is a real difference between the means
  • B = the 'between groups sum of squares'
  • W = the 'within groups sum of squares'


Error Messages

If any of the elements of L1 are single values rather than lists, such as {{1,2,3},4,{5,6,7}}, then the message L1 must be a list of lists is shown, since each element must be a list of values.

If L1 contains only a single group of data, such as {{10,11,12,13}}, or is empty, then the message Minimum 2 data sets in L1 is given, since ANOVA requires at least two sets (groups) of values. (In practice if you have only two groups you are better off doing a Student's t-test to compare their means, though ANOVA is possible.)



Hardware Requirements

ANOVA runs on a Hewlett-Packard 39g+ calculator. It should also work on the 39g and 40g, and possibly the 38g, but has not been tested.

The program takes up 0.56 kilobytes of RAM when first loaded, which expands to 1.8 kilobytes when run. There needs to be enough free memory to store the data in L1 of course.



Variables Used

ANOVA uses the following HOME variables, and will therefore overwrite any existing information stored in them:

  • Real variables B, C, N, P, T, W
  • List L1 to hold the data


Possible Improvements

When a genuine difference between the means has been found, include a test to determine which one(s) is/are different.
Add 'multi-way' ANOVA capability.



Program Listing

SIZE(L1)®C:
IF C<2 THEN MSGBOX "Minimum 2 data
sets in L1":STOP:END:
å(I=1,C,SIZE(L1(I)))®N:
IFERR å(I=1,C,åLIST(L1(I)))²/N®F THEN
MSGBOX "L1 must be a list of lists":
STOP:END:
å(I=1,C,åLIST(L1(I))²/SIZE(L1(I)))-F®B:
å(I=1,C,åLIST(L1(I)²))-F-B®W:
(B/(C-1))/(W/(N-C)®F:
UTPF(C-1,N-C,F)®P:
100-100*P®C:
ERASE:
DISP 1;">>>>ANOVA Results<<<<":
IF C>95 THEN DISP 3;"Means differ
at 95%":
ELSE DISP 3;"Means same at 95%":END:
DISP 4;"F = "ROUND(F,-5):
DISP 5;"Prob= "ROUND(P,-5):
DISP 6;"(Chance H0 is false":
DISP 7;"= "ROUND(C,3)"%)":
FREEZE:
(Note: ® means 'STO')
Comments

Store number of groups.
If there are fewer then 2 groups then
print error message and stop.
Count total number of values.
Calculate Total Sum of Squares.
If any groups are not a list this causes an error
so trap it and show error message.
Calculate Between Groups Sum of Squares.
Calculate Within Groups Sum of Squares.
Calculate F-value for data.
Find probability of this being due to chance.
Convert to percentage confidence.
Clear display.
Show title.
If confidence of real difference is >95%
report that means are different,
otherwise report they are the same.
Display rounded F-value.
Display rounded probability.
Report confidence level that all the
means are not the same.
Pause until a key is pressed.

Program written by Peter Ochocki, December 2005.