Regression Analysis is the study of relationships between variables. Because of its generality and applicability, Regression Analysis is one of the most pervasive of all statistical methods in the business world.
In the given problem the data presents three variables for 32 recently auctioned comparable items.
Objective: Determine the validity of the assumption of an antique collector regarding the variables influencing the price of goods sold on the auction by building a Regression Model and providing adequate analysis.
To determine the relationships between the dependent variable and independent variables (in the given case item’s price, its age and number of people bidding) we use Scatterplots. Higher the correlation between explanatory variable and response variable more linear is the relationship, thus unit change in independent variable has considerable impact in outcome of dependant variable. In this particular case correlation between items’ price and its age and number of bidders was 0.73 and 0.43 respectively and are presented graphically below.

It can be concluded that the age of an item plays more decisive role in its price then the number of people bidding for it. Furthermore, to consider the importance of more than one explanatory variable (in this particular case two, age and number of bidders) independent variables should have low correlation among them. Using Scatterplot the correlation between items’ age and number of bidders was determined - negative 0.24,

low enough to consider both variables while building the regression model.
StatPro’s multiple regression procedure is used to estimate the equation for costs of items as a function of items’ age and the number of people bidding for them.
It uses The Least Squared Method to estimate the regression.
|
Results of multiple regression for Auction_Price |
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
Summary measures |
|
|
|
|
|
|
|
|
|
Multiple R |
0.9448 |
|
|
|
|
|
|
|
R-Square |
0.8927 |
|
|
|
|
|
|
|
Adj R-Square |
0.8853 |
|
|
|
|
|
|
|
StErr of Est |
133.1365 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ANOVA Table |
|
|
|
|
|
|
|
|
|
Source |
df |
SS |
MS |
F |
p-value |
|
|
|
Explained |
2 |
4277159.7188 |
2138579.8594 |
120.6511 |
0.0000 |
|
|
|
Unexplained |
29 |
514034.5000 |
17725.3276 |
|
|
|
|
|
|
|
|
|
|
|
|
|
Regression coefficients |
|
|
|
|
|
||
|
|
|
Coefficient |
Std Err |
t-value |
p-value |
Lower limit |
Upper limit |
|
|
Constant |
1336.7220 |
173.3561 |
7.7108 |
0.0000 |
-1691.2753 |
-982.1688 |
|
|
Age_of_Item |
12.7362 |
0.9024 |
14.1140 |
0.0000 |
10.8906 |
14.5818 |
|
|
Number_Bidders |
85.8151 |
8.7058 |
9.8573 |
0.0000 |
68.0098 |
103.6204 |
Above is presented Multiple Regression Output for recently auctioned comparable items.
Estimated Price of an Item = 1336 + 12.73Age of Item + 85.8Numver of Bidders
The interpretation of equation above is that if the number of bidders is held constant then the price of the item is expected to increase by 12.73 for each additional year increase in the age of that item, and if the age is being held constant the price of an item will rise by 85.8 per one increase in number of bidders. 1336 is a fixed component of an items price.
Summary Measures:
R² (R-Square) measures the goodness of linear fit. It is the percentage of variation of the response variable explained by the combined set of explanatory variables. As it can be seen from the table above R² is almost 0.9, that indicates high, linear correlation between dependent variable – Price, and independent variables – Age of the Item and Number of Bidders. The square root of R² is correlation between the fitted values and the observed values of the response variable; for the given model it is 0.94. It means that an items age and number of bidders explain 89% of estimated price. Scatterplot of fitted values versus observed values below presents this high correlation graphically.

Since adding additional variable to the equation increases the value of R² we cant really know is the additional variable helping to determine the accuracy of the prediction or not. Adjusted R² is listed in regression outputs. It helps to determine the relevance of the additional explanatory variables to the response variable. If Adjusted R² decreases with the addition of the extra variable it means that the variable (or variables) should be omitted.
Standard Error – Se is a measure of the prediction of an error we are likely to make when using multiple regression equation to predict the response variable. The smaller the standard error for a particular regression equation, the more accurate predictions tend to be. Table above indicates the standard error 133.1 meaning that approximately 2/3 of the predictions of the price should be within 1 standard error, or $133 of the actual Item price.
Regression Coefficients:
Estimated Price of an Item = 1336 + 12.73Age of Item + 85.8Numver of Bidders
The interpretation of equation above is that if the number of bidders is held constant then the price of the item is expected to increase by 12.73 for each additional year increase in the age of that item, and if the age is being held constant the price of an item will rise by 85.8 per one increase in number of bidders. 1336 is a fixed component of an items price.
P- value – Indicates the probability of making type I error (there is no relationship between dependent and independent variables). If it is as high as 0.05 we should not use the variable as the predictor. As it can be seen form the regression output above P value for both explanatory variables is very low, thus both independent variables are used in the equation.
t- value – The ratio of the estimate of a regression coefficient to its standard error, used to test whether the coefficient is 0.