# Clustering and Classification methods for Biologists

## Search

options

### Intended Learning Outcomes

At the end of this section students should be able to complete the following tasks.

• Recognise when discriminant analysis is a suitable classifier.
• Explain the structure of a discriminant function.
• Interpret class differences using the structure matrix.
• Interpret the accuracy of an analysis and recognise any limitations to the accuracy.
• Decide how many predictors should be used.

top

### Background

A common biological problem is identifying the features responsible for splitting a set of observations into two or more groups. For example, we may wish to distinguish between

• insecticide resistant and susceptible individuals;
• active vs inactive molecules;
• disease prone vs normal people.

If there is information about individuals (cases), obtained from a number a variables, it is reasonable to ask if these variables can be used to define groups and/or predict the group to which an individual belongs. Discriminant analysis and logistic regression are two methods that achieve these aims.

Discriminant analysis is one of the simplest and most widely used classification methods. It was widely used in biology, but became less popular when a number of papers, particularly in ecological journals, questioned its validity for most analyses and suggested that logistic regression was a better alternative. However, in empirical tests discriminant analysis often emerges as one of the better classifiers.

Discriminant analysis works by creating a new variable which is a combination of the original predictors. This is done in such a way that the differences between the predefined groups, with respect to the new variable, are maximized. The most comprehensive text dealing with all aspects of discriminant analysis is Huberty's (1989) book. Note that group membership must be known before using Discriminant Analysis.

In summary:

• Discriminant Analysis is used to distinguish between two or more predefined 'groups'.
• The analysis identifies those variables that contribute most to the differences between groups.
• It is also possible to use Discriminant Analysis as a classification technique that can be used to place an unknown case into one of the groups.
top

top

### Resources

Huberty, C. J. 1994. Applied discriminant analysis. Wiley.

Hosted by www.Geocities.ws