Example 55.1: Aerobic Fitness Prediction, Taken from SAS




    
    data fitness;
    input Age Weight Oxygen RunTime RestPulse RunPulse MaxPulse @@;
    datalines;

   44 89.47 44.609 11.37 62 178 182   40 75.07 45.313 10.07 62 185 185
   44 85.84 54.297  8.65 45 156 168   42 68.15 59.571  8.17 40 166 172
   38 89.02 49.874  9.22 55 178 180   47 77.45 44.811 11.63 58 176 176
   40 75.98 45.681 11.95 70 176 180   43 81.19 49.091 10.85 64 162 170
   ..............
   ..............
   49 76.32 48.673  9.40 56 186 188   48 61.24 47.920 11.50 52 170 176
   52 82.78 47.467 10.50 53 170 172
   ;

   Response variable is "Oxygen"  

   title 'selection process';
   proc reg data=fitness;
    model Oxygen=Age Weight RunTime RestPulse RunPulse MaxPulse/
	            p r tol vif collin influence details selection=adjrsq;
   run;					
	 Options for "selection"-> cp,adjsqr,rsquare,forward,backward,stepwise

   title '';
   quit;

 

 
  Influential Observation

  Cook's Distance
  larger the cook's distance is
  larger leverage, ie more influence

  DFFITS

  Hat matrix (hii)
  large value of hii 
  implies influential observation.

 

  Collinearity

   VIF (Variance Inflation Factor)
    vif = 1  => collinearity absent
    vif > 1 => collinearity present

   Condition Index (k)

        max eig(X'X)
   k =  ------------
	min eig(X'X)
   k > 30 is a symbol of problem. 

   Correlation b/w regressors
    this gives some idea about the
    collinearity between variables.

 

 Some Plots. 

 data temp;
  do i = 1 to 31;
   id=i;
   output;
  end;
  keep id;


   proc reg data=fitness noprint;
    model Oxygen=Age Weight RunTime RestPulse RunPulse MaxPulse/
	            selection=backward sls=0.05;
    output out=resout student=standres r=resid p=predict cookd=cooks h=leverage;
   run;

 title '';


 data resout1;
  merge temp resout;
 run;

 Check for homogeneity and independence
 symbol v=dot h=1 i=join;
 proc gplot data=resout1;
  plot cooks*id leverage*id  /vref=0; 
 run; 

 proc plot data=resout1;
  plot standres*id standres*predict="*"/vref=0;
 run;

 Check for normality
 proc univariate data=resout1 noprint;
  probplot standres/normal(mu=est sigma=est);
  inset normal;
 run;


 Piecewise Regression

 Suppose you have only one regressor variable and one response variable, then,
 the general linear model will be 

 	y = a + bx + e

 but if the data shows (plot y vs x) piecewise pattern then you can do 
 the following :
  
 	y = a + bx + cz + dzx + e

	 where z = 0 for group-1 
		 = 1 for group-2.

Hosted by www.Geocities.ws

1