Comparing two methods

Suppose we wish to compare the performance of two methods in ability to attain a quality characteristic x.

Let us call the old method A and new method B. We carry out n_A tests with method A and n_B tests with method B. Of course, these experiments should be randomized. That is, we should not run all tests with A first and then all tests with B or vice versa. Instead, we could toss a coin sequentially, and test with A whenever we got Head in our sequence or in some such random order mixing up As and Bs. Thus the actual run order could be A A B A B B A etc. If it is possible to carry out all the tests simultaneously, then also we should randomize to break up the effect of any lurking factors.

Let x_A be the average of the n_A readings of x (the usual notation is xbar for the average, but the bar has been omitted for easy coding in html) and x_B the average of n_B readings of x. Let S_A² be the variance of the n_A readings of method A and S_B² the variance of the n_B readings of method B. It is always better if n_A = n_B >= 8, ie the number of tests in each method should be equal and there should be at least 8 tests in each method.

Our aim is to improve x. If x_B is lower than x_A, it is easy: we cannot conclude that method B is better; we give the benefit of doubt to the old method. But if x_B is greater than x_A, we are in a dilemma: Is the new method better, or is the increase just due to chance?

So we need to compare the increase in x due to change in method with the experimental error.

We can prepare boxplots for method A and method B side by side on the x axis, with a common scale on the y axis. Comparing the differences in mean with the variability in the data, we can take a decision. If a quantitative decision is desired, we have to carry out further analysis.

To obtain a measure of the experimental error, assuming that the experimental error variation does not depend on the method (see the page on residual analysis, for how to test this assumption), we calculate the pooled variance S_p² = [(n-1)S_A² + (n-1)S_B²] /[n_A+n_B-2].

By dividing the observed effect x_B - x_A by the standard error of the difference in means, we obtain the standardized effect as .

We compare this t_o with t_crit, the critical value of the t distribution from tables, choosing an area in the right tail corresponding to the level of significance alpha (commonly 0.05) and the appropriate degrees of freedom n_A+n_B-2.

If t_o > t_crit, we can conclude that the new method is significantly better than the old.

Example:
A company has formulated a new gasoline. We would like to test whether the octane number of the new gasoline is better than that of the old. An experiment was conducted and the new gasoline gave the results 89.5, 91.5, 91.0, 89.0, 91.5, 92.0, 92,0, 90.5, 90.0 and 91.0. The old gasoline gave results 89.5, 90.0, 91.0, 91.5, 92.5, 91.0, 89.0, 89.5, 91.0, 92.0. Of course these results were not obtained in this order, but a random order.

We have x_A = 90.70, S_A² = 1.34, x_B = 90.80, S_B² = 1.07, n_A = 10, n_B = 10.

S_P² = 1.21; S_P = 1.10; t_o = 0.20, t_crit = 1.734 for a right tail area of 0.05 and 18 degrees of freedom. Since t₀ < t_crit, the new gasoline is not a significant improvement.

Reference:

Montgomery, Douglas C., "Introduction to Statistical Quality Control Third Edition", John Wiley & Sons Inc, 2001, pp.101-103.

HOME | LEVEL ABOVE

Hosted by www.Geocities.ws