Analysis…Calculations:
Multiple Linear Regression analysis (utilizing multiple software packages) was then performed on the data set, within the boundaries of the general symbolic model above, to determine the value of each variable coefficient (bi). Initially using a standard multiple linear regression calculation method and then gravitating to a backward elimination step-wise regression calculation, we came up with the equation coefficient unknowns as detailed in Table 1 (Complete results) in the Figures and Tables section of this report.
Based on the regression analysis and the preliminary symbolic model equation above, the estimated regression model took the form:
Preliminary First-Step Regression Model Equation
yhat = bo + b1x1 + b2x2 + b3x3 + b4x4 + b5x5 + b6x6 + b7x7
= 13.4 - .0122 x1 – 2.62 x2 + .55 x3 + .93 x4 – 2.597 x5 + 19.8 x6 + 32.6 x7
= 13.4 - .0122#FORMS – 2.62#ENG+ .55#DES + .93 ERFORM– …
… - 2.597 RFQFORM+ 19.8 MODCAT + 32.6 CMPLXCAT |
In this model estimation, the signs of the coefficient values were as expected except for that of the Number of Designers which was positive, indicating that as the number of designers was increased while holding the number of engineers constant, the resulting duration/lead-time would actually increase. This could be explained by reviewing the daily interactions between the Engineers and Designers within the department. After a short while, it became apparent that if the number of designers was increased, while not increasing the number of engineers, the department engineers would spend more of their time supervising and directing the work of the designers and less time on their “value-added” work. This interaction was a moot point though as will be described when discussing the individual coefficient significance, later in this report.
Analysis…Overall and Individual Model Significance:
As can be seen from Appendix B: Significance Test Calculations and Table 1: Distilled Regression Analysis Results in the Figures and Tables section of this report, although we had overall model significance, the coefficients b1 (#FORMS), b3 (#DES) and b4 (ERFORM) are dropped out of the estimated regression equation due to lack of individual significance such that resulting regression model became:
Final Regression Model Equation
yhat = bo + b2x2 + b5x5 + b6x6 + b7x7
= 14.8 – 2.75 x2 – 3.2 x5 + 19.8 x6 + 32.7 x7
= 14.8 – 2.75#ENG – 3.2RFQFORM* + 19.8MODCAT* + …
… + 32.7CMPLXCAT*
NOTE: * signifies a dummy variable ( xi = 0 or 1) |
NOTE:
Other preliminary regression analysis models/evaluations were investigated as follows:
- 9-variable model, with Dummy Variables for all the qualitative variables to try and determine the initial influence of each variable on the model. Model failed because the statistical software package automatically dropped the Basic Category out of the calculations and DR Form Type caused a “near singularity error” not allowing the calculations to be completed.
- Additional Intermediate Dummy variables were introduced into the model to determine if there was a threshold level for the Number of Projects / Forms where the Dummy variable would become significant. Could not find a threshold level that gave the new Dummy variable significance.
- Regression calculations were made with 95%, 85% and then finally 70% confidence levels to determine if the overall outcome of the regression models varied. Not until we ran the analysis at a 70% confidence level did we gain an additional independent variable but only increased the overall fit (R2) by .03%. Final calculations were kept at the 95% confidence level (alpha=.05).
Analysis…Residual Plots and R2 Review:
The final step in the regression model analysis was to review and analyze the significance of the R2 value and residual plots for the complete regression model when plotted against the predicted values for the duration/lead-time.
With a value for R2 of .306 and a standard error of 14.545 days, it became apparent that although this model showed overall significance and had four independent variables with individual significance, the model only explained 30.6% of the variability of the data and therefore represented only a moderately good fit, indicating that there could potentially be a number of variables, related to the duration of projects, that had been overlooked.
When looking at the residual plots, as detailed in Figure 3 , in the Figures and Tables section of this report, we could also see, specifically from the shape of the Residuals vs the Fitted Values (Predicted Lead-time), that our model possessed a non-constant variance, as indicated by the increasing spread. This inconsistent and increasing variation in the predicted values and therefore the residual can most likely be attributed to a non-linear relationship within the data set
NOTE: The odd groupings of data are a result of the dummy variables in the model, since each coefficient contributes only when the variable is present.