Example 55.1: Aerobic Fitness Prediction, Taken from SAS
data fitness;
input Age Weight Oxygen RunTime RestPulse RunPulse MaxPulse @@;
datalines;
44 89.47 44.609 11.37 62 178 182 40 75.07 45.313 10.07 62 185 185
44 85.84 54.297 8.65 45 156 168 42 68.15 59.571 8.17 40 166 172
38 89.02 49.874 9.22 55 178 180 47 77.45 44.811 11.63 58 176 176
40 75.98 45.681 11.95 70 176 180 43 81.19 49.091 10.85 64 162 170
..............
..............
49 76.32 48.673 9.40 56 186 188 48 61.24 47.920 11.50 52 170 176
52 82.78 47.467 10.50 53 170 172
;
Response variable is "Oxygen"
title 'selection process';
proc reg data=fitness;
model Oxygen=Age Weight RunTime RestPulse RunPulse MaxPulse/
p r tol vif collin influence details selection=adjrsq;
run;
Options for "selection"-> cp,adjsqr,rsquare,forward,backward,stepwise
title '';
quit;
Influential Observation Cook's Distance larger the cook's distance is larger leverage, ie more influence DFFITS Hat matrix (hii) large value of hii implies influential observation. |
Collinearity
VIF (Variance Inflation Factor)
vif = 1 => collinearity absent
vif > 1 => collinearity present
Condition Index (k)
max eig(X'X)
k = ------------
min eig(X'X)
k > 30 is a symbol of problem.
Correlation b/w regressors
this gives some idea about the
collinearity between variables.
|
Some Plots.
data temp;
do i = 1 to 31;
id=i;
output;
end;
keep id;
proc reg data=fitness noprint;
model Oxygen=Age Weight RunTime RestPulse RunPulse MaxPulse/
selection=backward sls=0.05;
output out=resout student=standres r=resid p=predict cookd=cooks h=leverage;
run;
title '';
data resout1;
merge temp resout;
run;
Check for homogeneity and independence
symbol v=dot h=1 i=join;
proc gplot data=resout1;
plot cooks*id leverage*id /vref=0;
run;
proc plot data=resout1;
plot standres*id standres*predict="*"/vref=0;
run;
Check for normality
proc univariate data=resout1 noprint;
probplot standres/normal(mu=est sigma=est);
inset normal;
run;
|
Piecewise Regression Suppose you have only one regressor variable and one response variable, then, the general linear model will be y = a + bx + e but if the data shows (plot y vs x) piecewise pattern then you can do the following : y = a + bx + cz + dzx + e where z = 0 for group-1 = 1 for group-2. |