Generalized Linear Model (regression kind of problem)

 1) If you have a continuous response variable and the regressor variables
    (or, indep variables) are categorical or continuous then :

  (a) If the distribution of reponse(y) is normal then there is no problem,
  you can use PROC GLM without any fear. Because in PROC GLM they assume 
  that the error distribution is normal and you cannot specify your own 
  distribution.

  (b) If the distribution of response(y) is not normal then you will have to 
  use PROC GENMOD with LINK = IDENTITY.

  Example :
  
  first make your data set
  If you have the data set in some file then extract it using the following command :
  
  
  data dat1;                                "dat1" -> SAS datafile name 
  infile 'C:\usr\dir\namedat.csv' dlm=',';  Proper address of the datafile
  input lt wt cf loglt logwt logcf x1-x78;  Variable names
     
  If you want to make your own data right over here, then do the following:
  
  
  data dat1;
  input y1 x1 x2 x3 x4 @@;
  datalines;
  45 5 65 65 52.05
  ......
  ......
  25 2 63 45 25.09
  ;
  
  If you can get the data set in the Analyst then, you will have to use "PRC GLM"
  as PROC GLM data=sasa.dat1 outstat=ssout;

  Once you are done with the dataset, you sill start doing the analysis.

  Suppose the model is : y = x1 x2 x3 x4 , you might want to look at the interactions.
  You should notice that which are categorical variables and which are continuous.

  

  ods output Estimates=estim;   You can save anything and everything which can be 
  ods listing close;            seen in the "RESULT" on your left window.
                                make sure to do "ods listing; proc print data=estim"

  PROC GLM data=dat1 outstat=ssout;      "outstat=ssout" contains  "SS F P-value"
  class x1 x2 x3;                        Only x4 is continous variable and rest are categorical
  model y = x1 x2 x3 x4/ss1;             "ss1" gives the Type-1 Sum of Squares.
  output out=resout r=resid p=predict;
  contrast 'A-linear' x1 1 -1;           gives the contrast SS and F-test for this
  estimate 'A-linear' x1 1 -1;           gives the contrast estimate and its Std error.


  ods listing;

   title 'Normal Probability plot';      Gives the normal prob plot for Estimates.
   symbol v=square h=1;
   proc univariate data=estim noprint;
    probplot Estimate/normal(mu=est sigma=est) ;
    inset normal;
   run;

  proc plot data=resout;                      Some residual plots for Adequacy of model
   plot resid*(predict x1 x2 x3 x4)="*"/vref=0;
  run;

  proc univariate data=resout noprint;
   probplot resid/normal(mu=est sigma=est);
  inset normal;
  run;

  


  Some typical questions:
  (a) R-squared for the model. 
  (b) General Anova for the model.
  (c) Parameter estimates with std err.
  (d) Residual Analysis for Model Adequacy.

  We will discuss LINEAR REGRESSION in a while.






 2) Suppose you have a categorical response variable and the regressor variables
    are categorical or continuous then you should use PROC GENOMD


  Consider Example 7.3.1 from Dobson's book. Here we have number of patients killed
  and the number of patients under treatment, we have only one independent variable
  "dose", so we have to find a perfect model for regressing 

   numkill/totnum ~ dose..

  (c) Parameter estimates with std err.
  (d) Residual Analysis for Model Adequacy.
  (e) Some Goodness of Fit Statistics.

  
  data beetle;
  input dose totnum numkill;
  datalines;
 
  1.6907 59 6
  .... 
  ....
  1.6907 59 6
  ;

  data new1;
   set beetle;
   prop=numkill/totnum;
   output;
  run;


  proc plot data=new1;
   plot prop*dose;
  run; 

  
 
  Suppose you have the data set as :
	Sn	deaths	Population
	1	  y1	   n1
	2	  y2	   n2
        -	  - 	   -
	-	  -	   -

   and if ni >> yi then we consider poisson regression otherwise 
   we consider binomial regression model.
  
proc genmod data=new1; model numkill/totnum=dose /type3 dist=binomial link=logit lrci wald waldci; there are some options for the DIST and LINK functions dist=bin link=logit/probit dist=poisson link=log dist=mult link=clogit/cprobit contrast ; estimate ; lsmeans catvar/ diff cl; You can use this to get the estimates of indep var confidence intervals for that, only for categorical vars. output out=resout reschi=chiresid resdev=devresid stdreschi=stdchiresid p=predict; run;
Note : If you specify the option PSCALE / DSCALE in the model then you will have to adjust the standard errors & CI's on your own , but if you do not specify anything in the model statement for scale adjustment then the program automatically adjusts for dispersion parameter. Usually the estimate of dispersion parameter is : phi= Dev/df(dev) If we have response as a categorical variable with more than two categories then we go for Logistic regression in two cases, one is Nominal Logistic Regression and the other is Ordinal Logistic Regression... In case of Ordinal Logistic Regression.. just use the above code and dist=mult, link=clogit, this will suffice. But if you have the Nominal respose variable with more than two categories :Example-8.3.1(Dobson) you can also use PROC PROBIT instead of PROC LOGISTIC proc logistic data = new1 order = internal; class sex(ref='f') age(ref='1')/param=ref; freq count; "count" is the number of obsn at different categories of response variable "respcat". model respcat (ref='1')= sex age/link = glogit alpha=0.01 expb clodds=wald; output out=outdat reschi=Resid pred=Predicted; run; "expb" gives the Odds Ratio proc print data=outdat; run; How to get ODD's Ratio..? proc logistic data = school order = internal; freq count; model style = /link = glogit ; ods output parameterestimates= odds; run; data odds1; set odds; odds = exp(estimate); run; proc print data = odds1; var response estimate odds; run; if I would like to fit an ordinal logistic regression , then use the following code ... proc logistic data = new1; class sex(ref='f') age(ref='3')/param=ref; freq count; model respcat = sex age/link = clogit alpha=0.05 expb clodds=wald; run;
Hosted by www.Geocities.ws

1