Chapter 1 - Introduction to Statistics

(This information is a brief summary of Chapter 1. It is intended to help the student get started in this course even though the student has not yet obtained the textbook.)

Statistics is a collection of methods for planning experiments, obtaining data and then organizing, summarizing, presenting, analyzing, interpreting and drawing conclusions based on the data.

A parameter is a numerical measurement describing some characteristic of a population. A population is the complete collection of all elements (scores, people, measurements, etc.) to be studied. The collection is complete in the sense that it includes all subjects to be studied.

Example of a parameter. When Lincoln was first elected to the presidency, he received 39.2% of the 1,865,908 votes cast. If we consider the collection of all those votes to be the population being considered, then the 39.82% is a parameter, not a statistic.

A statistic is a numerical measurement describing some characteristic of a sample.

Example of a statistic. Based on a sample of 877 surveyed executives, it was found that 45% of them would not hire anyone whose job application contained a typographical error. The figure of 45% is a statistic because it is based on a sample, not the entire population of all executives.

Discrete and Continuous Data

Numerical data falls into 1 of 2 categories : discrete and continuous.

Data is discrete if there are only a finite (countable) number of values possible or if there is a space on the number line between each 2 possible values.

Ex. A 5 question quiz is given in a Math class. The number of correct answers on a student's quiz is an example of discrete data. The number of correct answers would have to be one of the following : 0, 1, 2, 3, 4, or 5. There are not an infinite number of values, therefore this data is discrete. Also, if we were to draw a number line and place each possible value on it, we would see a space between each pair of values.

Ex. In order to obtain a taxi license in Las Vegas, a person must pass a written exam regarding different locations in the city. How many times it would take a person to pass this test is also an example of discrete data. A person could take it once, or twice, or 3 times, or 4 times, etc. So, the possible values are 1, 2, 3, ... There are infinitely many possible values, but if we were to put them on a number line, we would see a space between each pair of values.

Discrete data usually occurs in a case where there are only a certain number of values, or when we are counting something (using whole numbers).

Continuous data makes up the rest of numerical data. This is a type of data that is usually associated with some sort of physical measurement. It results from many infinitely possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions or jumps.

Ex. The height of trees at a nursery is an example of continuous data. Is it possible for a tree to be 76.2" tall? Sure. How about 76.29"? Yes. How about 76.2914563782"? You betcha! The possibilities depends upon the accuracy of our measuring device.

One general way to tell if data is continuous is to ask yourself if it is possible for the data to take on values that are fractions or decimals. If your answer is yes, this is usually continuous data.

Ex. The length of time it takes for a light bulb to burn out is an example of continuous data. Could it take 800 hours? How about 800.7? 800.7354? The answer to all 3 is yes.

Levels of Measurement of Data

Nominal data consists of categories only. Data cannot be arranged in an ordering scheme. Examples include: red, yellow, green and Texas, Oklahoma, California.

Ordinal data consists of categories which are ordered, but differences cannot be determined or they are meaningless. Examples include: compact cars, mid-sized cars, full-sized cars - an order is determined but the difference between compact and mid-sized is not the same as the difference between mid-sized and full-sized.

Interval data consists of data where the differences between the values are meaningful, but there is no natural starting point. Ratios are meaningless. For example, data measured in 'years' is interval. The years 1000, 2000, 1776 and 1492 can be ordered and the difference between 1000 and 1001 is the same as the difference between 2000 and 2001, however the ratio of 1000/2000 has no meaning. Another example, temperatures measured in the Celcius and Farenheit temperature are interval while temperatures measured in the Kelvin scale are 'ratio' data. (0 degrees in Kelvin indicates absence of heat while 0 degrees in Farenheit and Celcius still have heat as evidenced by the fact that negative temperatures exist on these scales.)

Ratio data is similar to interval data with the exception that there is a natural zero. Ratios of ratio data have meaning. For example, mileage is ratio data since 10 miles is twice as far as 5 miles.

Questions

1. A sample of students is selected and the average (mean) age is 20.7 years. Is this a statistic or a parameter?
3. All of the state governers are surveyed and 30 of them are found to be Democrats. Is this a statistic or a parameter?
5. A statistics professor counts 3 absent students. Is this discrete or continuous data?
7. In a survey of 1068 Americans, 673 state that they own answering machines. Is this discrete or continuous data?
In the following, determine which of the four levels of measurement is most appropriate:
9. Heights of women basketball players in the WNBA.
11. Noon temperatures (in degrees Farenheit) in Death Valley this week.
13. The years in which new editions of a book are published.
15. CONSUMER REPORTS magazine ratings of 'best buy, recommended, not recommended'.
17. The actual contents (in ounces) or cola in Coke cans labeled 12 oz.

Uses and Abuses of statistics

Abuses of statistics are abundant. Some abuses are intentional. Some abusers of statistics are simply ignorant or careless while others have ulterior motives. Here are some of the ways that statistics can be abused: Entire books have been devoted to 'lying with statistics'. Understanding these practices can be helpful in evaluating the statistical data found in everyday situations.

QUESTIONS
1. In an ABC 'Nightline' poll, 186,000 viewers each paid 50 cents to call a '900' number with their opinion about keeping the United Nations headquarters in the US. The results showed that 67% of those who called were in favor of moving the UN headquarters out of the US. Interpret the results by identifying what we can conclude about the way the general population feels about keeping the UN headquarters in the US.
3. The Consumer Price Index (CPI) is based on the cost of goods and services purchased by typical consumers. Assume that the cost is $500 per year.
a. If there is inflation so that all costs rise 5% next year, what is next year's cost?
b. Assume that in the year after next year, all costs drop by 5%, what is that year's cost?
c. When the 5% increase is followed by the 5% decrease, do costs return to the original $500 level?
5. The NEWPORT CHRONICLE claims that pregnant mothers can increase their chances of having healthy babies by eating lobsters. That claim is based on a study showing that babies born to lobster-eating mothers have fewer health problems than babies born to mothers who don't eat lobster. What is wrong with the claim?
7. The Hawaii State Senate held hearings when it was considering a law requiring that motorcyclists wear helmets. Some motorcyclists testified that they had been in crashes in which helments would not have been helpful. What important group was not able to testify?
9. A survey includes this item: "Enter your height in inches." It is expected that actual heights of respondants can be obtained and analyzed, but there are two different major problems with this item. Identify them.
11. Is a 10% price cut the same as two consecutive 5% price cuts? Why or why not?
13. A researcher at the Sloan-Kettering Cancer Research Center was once criticized for falsifying data. Among his data were figures obtained from 6 groups of mice, with 20 individual mice in each group. These values were given for the percentage of successes in each group: 53%, 58%, 63%, 46%, 48% and 67%. What is the major flaw?
15. A NEW YORK TIMES editorial criticized a chart caption that described a dental rinse as one that "reduces plaque on teeth by over 300%."
If you remove 100% of some quantity, how much is left?
What does it mean to reduce plaque by over 300%


Design of Experiments

Sometimes, we will want to explore data. Sometimes, we have a particular objective in mind and we want to analyze the data to help us investigate that objective. Data can come from observation or experiment.
The basic steps followed in designing an experiment are: CONTROLLING EFFECTS OF VARIABLES. When conducting an experiment, it is all too easy to receive interference from variable factors that are not relevant to the issue being studied. A common technique is to conduct a DOUBLE-BLIND study, in which the subjects being treated do not know the treatment and the people administering the treatment do not know the treatment being administered. This is easy to do when medicines are administered, one can be the real treatment and the other can be a placebo.
Confounding occurs in an experiment when the effects from two or more variables cannot be distinguished from each other. For example, if a researcher is studying if people stay away from meetings because of snow. It could be that people stay away from meetings because of cold weather which occurs at the same time as snow.
SAMPLE SIZE. The sample size must be large enough so that the erratic behavior of very small samples will not produce misleading results.
RANDOMIZATION. Data carelessly collected may be so completely useless that no amount of statistical torturing can salvage them. In a RANDOM SAMPLE, members of the population are selected in such a way that each has an equal chance of being selected. With random sampling, we expect all groups of the population to be (approximately) proportionaly represented. Other methods of sampling are: No matter how well the sample is selected, there is likely to be some error in the results. By properly selecting the sample, the nonsampling error is minimized.

QUESTIONS
1. Cans of Coke are opened and the volumes (in ounces) of the contents are measured. Is this an observational study or experiment?
3. The effectiveness of multimedia teaching is tested with a sample of students who complete a course of study using the multimedia approach. Is this an observational study or experiment?
In questions 5, 7, 9, 11, 13 and 15, identify which type of sampling is used: random, systematic, convenience, stratified or cluster.
5. The Gallup Organization plans to conduct a poll of New York Residents with the '212' area code. Computers are used to randomly generate telephone numbers that are automatically dialed.
7. An ABC news report polls people as they pass him on the street.
9. A General Motors researcher has partitioned all registered cars into categories of subcompact, compact, mid-size, intermediate and full-size. She is surveying 200 randomly selected car owners from each category.
11. The College of Newport conducts a study of student drinking by randomly selecting 10 different classes and interviewing all of the students in each of those classes.
13. An economist is studying the effect of education on salary and conducts a survey of 150 randomly selected workers from each of these categories: less than a high school diploma, high school diploma, more than a high school diploma.
15. A police sobriety checkpoint where every fifth driver is stopped and interviewed.
17. Describe a procedure for obtaining a simple random sample of 200 students from the populaton of full-time students at your college.
19. Describe a procedure for obtaining a simple random sample of 250 college textbooks from the population of all college textbooks used in your state.
21. Two categories of survey questions are OPEN and CLOSED. An open questions allows a free response, while a closed question allows only a fixed response. For example:
Open question: What do you think can be done to reduce crime?
Closed question: Which of the following approaches would be most effective in reducing crime?

a. What are the advantages and disadvantages of open questions?
b. What are the advantages and disadvantages of closed questions?
c. Which type is easier to analyze with formal statistical procedures, and why is that type easier?

The material in this brief review was taken from the text. It is not intended to replace the text, the student will still need to read Chapter 1 in the textbook when it arrives. This is simply designed to help the student get started on the statistics course.
Hosted by www.Geocities.ws

1