Statistical Sampling Page

Tools Available | Audit Services Home Page

Click on your browser's Print button to print this document at your printer.

How To Use Statistical Sampling

Introduction

The following sections describe the uses and applications of basic sampling plans and sample selection techniques. Anyone interested in survey sampling, quality control, accuracy of records, or any other situation in which you must draw conclusions based on an inspection of part of a population should seriously consider using statistical sampling techniques. Any form of sampling, whether statistical or judgmental, is an application of a procedure to less than 100% of the population.

Sampling and testing are commonly used terms to describe the process of obtaining information about an entire population by examining only a part of it. The following sections are intended to briefly examine and clearly explain the basic concepts of sampling theory to provide the user with the necessary background to apply sampling methods. This presentation assumes the reader has had some formal statistics course(s) in their education process. The reader should consult available reference works such as Arkins' Handbook of Sampling for Auditing and Accounting. An Excel based statistical sampling tool is available for download which you can use to calculate sample size, precision, confidence level, standard deviation and generate lists of random numbers. (To obtain a copy of the program, click on the link and save the Excel file to the directory of your choice.)

Advantages of Sampling with Statistical Measurements

An effective sampling method often requires more than objectivity. It requires some means for establishing sample sizes and appraising sample results mathematically. Such a sample will have a behavior which is measurable in terms of the rules of the theory of probability. When a sample is obtained statistically, it is possible to state, with a stipulated degree of confidence, that the number of errors in the sample applies proportionately to the unsampled portion of the universe as well. Statistical sampling provides the user with the following advantages:

The sample result is objective and defensible. It is not subject to questions of bias that might be raised relative to a judgment sample.
The method provides a means of knowing, in advance, the size of the maximum sample needed. Sample size and justification for expense or time spent are defensible as reasonable when confidence level desired is reasonable for the risk being evaluated.
The method provides an estimate of the degree of risk that the sample may not be representative of the entire population. This limits deviation due to sampling variations.
Statistical sampling can be more accurate than an examination of every item in a large population. This is certainly true where the volume and tediousness of the data under review can lead to errors of omission or fact by the user.
Statistical sampling may save time and money. Frequently, a statistical sample may include fewer items than a fixed percentage sample. Also, one sample may be used to test several characteristics of a given record.
Objective evaluation of test results is possible. Statistical sampling provides a means of projecting test results within known limits of reliability.
Data may be combined and evaluated, even though obtained by different users.

Basic Statistical Terms

The following are some of the more basic statistical terms that users should be familiar with in the application and discussion of statistical measurement:

Average. The average or mean is the primary measurement of the central tendency of a variable. The average is calculated by summing all of the values for the variable and dividing the total by the number of occurrences.
Range. The range of a variable is the difference between the most extreme values for the variable. For example, if the number of items for a particular stock number in each of five storerooms was 2, 4, 3, 7 and 4, then the range would be from 2 to 7.
Standard Deviation. The standard deviation is a measure of the distance of all values from the arithmetic mean. It is the most useful measure of dispersion.
Reliability (also known as Confidence Level). This is a common sense notion of accuracy. It is meaningless unless used in conjunction with the concept of precision. Reliability is talking about the probability that the statistic measured by the sample (generally the mean) closely approximates the statistic for the entire population, or that the confidence interval will contain the true value being estimated.
Standard deviation is the key to determining reliability. For example, if the confidence interval spans plus and minus one standard deviation, the reliability would be 68.26%; two standard deviations, 95.44%.
Precision. This is a also common sense notion of accuracy. It is meaningless unless used in conjunction with the concept of reliability. Precision is also another way of describing the confidence interval. Precision is talking about the range of values about a statistic measured by a sample (generally the mean) which will have a given probability of containing the true value of the population's statistic. Precision is described in terms of a plus and minus value about a sample mean. For example, if the confidence interval is $1.00, precision would be shown as +/- $.50.
Confidence Interval. The confidence interval is the plus and minus interval about the sample statistic. It is another way of expressing the concept of precision.
Frequency Distribution. A frequency distribution is the classification of the elements of a set of data by a quantitative characteristic. The more classes in a frequency distribution, the more detail is shown. Too much detail makes summarization difficult.

Sampling Plans and Selection Techniques

The manner in which the population is filed or distributed will determine the kind of selection techniques to be used to select the sample. The specific plan and selection technique used should be precisely documented.

The sampling approaches (plan) most often used are described briefly as follows:

Estimation Sampling. This is the most widely used approach. There are two types of estimation sampling.
- Attributes Sampling. Should be used when the question of "how many?" is pertinent. It is used to determine the characteristics or "attributes" of a population. The results are expressed as a percent of the type of event specified. Each observation is mutually exclusive; i.e., it can only fall in one category.
  - Stop or Go Sampling. An extension of attributes sampling. Used to reach a conclusion about the upper precision limit of an attributes sample. May allow objective to be attained with a smaller sample size than possible using classical attributes model.
- Variables Sampling. Used to answer the question "how much?" Applied to populations made up of dollars, pounds, days, etc. Can provide an estimate of an average or total value of a population.
Acceptance Sampling. A sample of a given size is drawn by random sampling methods, and if not more than a given number of errors is found, the field examined is acceptable. This type of sample allows for only an accept or reject decision. The various types of acceptance sampling plans are discussed in detail in the Arkin text.
Discovery Sampling. Sometimes referred to as exploratory sampling, is used where evidence of a single error or instance of irregularity would call for intensive investigation. Discovery sampling is frequently of value when fraud, avoidance of internal controls, evasion of regulation or other critical performance and quality control measures are in question.
Dollar Unit Sampling. Uses a combined-attributes-and-variables method of statistical inference. It can be used simultaneously for both variables and attributes sampling. It differs from most sampling techniques in that the sampling units are defined as individual dollars rather than as physical units (such as inventory items). The procedures are performed on the individual accounts or inventory items containing the dollars selected.
Judgment Sampling. Applies to situations in which the user uses his/her judgment in determining sample sizes or methods of selection in place of a statistical sample. To employ good judgement sampling, the user must understand the basic principles of statistical sampling so as to know when one or the other is most appropriate. It is important, however, to remember that you cannot make an inference to the population as a whole using judgment sampling. Only the above-mentioned statistical plans provide that latitude and advantage.

The more commonly used sampling selection techniques are briefly described as follows:

Unrestricted Random Numbers. Each item in the population has an equal chance of being included in the sample. The most common method of sampling.
Interval Sampling. The sample items are selected from within the universe in such a way that there is a uniform interval between each sample item selected after a random start.
Stratified Sampling. The items in the population are segregated into two or more classes or strata. Each strata is then sampled independently. The results for the several strata may be combined to give an overall figure for the universe or may be considered separately, depending on the circumstances of the test.
Cluster Sampling. The universe is formed into groups or clusters of items. Then the items within the selected clusters may be sampled or examined in their entirety.
Multistage Samples. Involves sampling on several levels. As an example, the user takes a sample from several locations and then, in turn, takes another sample from within the sampled items.

A checklist for sampling is provided at the end of this page. The checklist shows the relationship between test objectives and sampling plans and population characteristics and sampling selection techniques.

Evaluation of Results

No matter what form of sampling plan or selection technique you might use, you’ll be faced with the task of evaluating test results. If your tests are based upon statistical measurement, you’ll find statistical methods available to aid in re-evaluating the premises used in selecting your sample sizes initially. The user should keep in mind that the following rules will make for more meaningful evaluations of test results:

Findings for each characteristic being tested should be evaluated separately; each characteristic represents a distinct and independent sample.
What is an "acceptable error rate" will depend upon the user's judgment of the significance of the errors, after a full study of the surrounding circumstances.
The user must always be on the alert to take a fresh look at his sample when significant matters are disclosed by his test. When a sample reveals a critical exception, the user should consider whether he should stop testing and apply other procedures to attempt to determine the cause and effect of the exception.

CHECKLIST FOR SAMPLING

For each test, determine (1) the test objectives to
establish the sampling plan and (2) the composition
and the location of the population to establish the 
sample selection techniques. 

TEST OBJECTIVES                             SAMPLING PLAN
		
1. To understand the characteristics
of the population or to determine
whether a system is in operation.  (No      Judgement
attempt is made to project sample 
results to the entire population.)
	
2. To estimate, with a specific degree
of reliability, the characteristics of a 
population (error rates, etc.) and to       Attribute/
determine "how many". 	                     Dollar Unit
	
3. To estimate, with a specified degree
of reliability, the value of a population 
or of one of it's characteristics (dollar   Variables/
value of inventories, dollar value of        Dollar Unit
improper travel vouchers, etc..) and to 
determine "how much". 
		
4. To obtain reasonable information 
from a sample about the characteristics
(such as minimal errors rates) of a         Stop-or-Go
population by selecting the lowest 
sample size. 	

5. To distinguish good lots of items 
from bad lots - where a bad lot is 
one that contains more than a               Acceptance
specified percentage of defective
items. 
		
6. To find evidence of at least one 
improper transaction in the population.     Discovery
		
POPULATION	                      SELECTION TECHNIQUE

1. When the population items are 
numbered or are listed in a register      Random numbers
or tab run.   		            

2. When random-number sampling 
would be too burdensome and when 
it is established (a) that no pattern 
in the population will bias the sample     Intervals
and (b) that items missing from the 
population can be identified.        
  		
3. When there is a considerable 
variation in the population and when
it is believed that more reliability 
would be achieved by breaking the          Stratifications
population down into groups of 
comparable or similar items.    
		
4. When the population is dispersed
geographically and when it would be
very burdensome to make                    Cluster or 
selections randomly from the entire         Multistage
population.

Hosted by www.Geocities.ws