Generalized Linear Model (regression kind of problem)
1) If you have a continuous response variable and the regressor variables
(or, indep variables) are categorical or continuous then :
(a) If the distribution of reponse(y) is normal then there is no problem,
you can use PROC GLM without any fear. Because in PROC GLM they assume
that the error distribution is normal and you cannot specify your own
distribution.
(b) If the distribution of response(y) is not normal then you will have to
use PROC GENMOD with LINK = IDENTITY.
Example :
first make your data set
If you have the data set in some file then extract it using the following command :
data dat1; "dat1" -> SAS datafile name
infile 'C:\usr\dir\namedat.csv' dlm=','; Proper address of the datafile
input lt wt cf loglt logwt logcf x1-x78; Variable names
If you want to make your own data right over here, then do the following:
data dat1;
input y1 x1 x2 x3 x4 @@;
datalines;
45 5 65 65 52.05
......
......
25 2 63 45 25.09
;
If you can get the data set in the Analyst then, you will have to use "PRC GLM"
as PROC GLM data=sasa.dat1 outstat=ssout;
Once you are done with the dataset, you sill start doing the analysis.
Suppose the model is : y = x1 x2 x3 x4 , you might want to look at the interactions.
You should notice that which are categorical variables and which are continuous.
ods output Estimates=estim; You can save anything and everything which can be
ods listing close; seen in the "RESULT" on your left window.
make sure to do "ods listing; proc print data=estim"
PROC GLM data=dat1 outstat=ssout; "outstat=ssout" contains "SS F P-value"
class x1 x2 x3; Only x4 is continous variable and rest are categorical
model y = x1 x2 x3 x4/ss1; "ss1" gives the Type-1 Sum of Squares.
output out=resout r=resid p=predict;
contrast 'A-linear' x1 1 -1; gives the contrast SS and F-test for this
estimate 'A-linear' x1 1 -1; gives the contrast estimate and its Std error.
ods listing;
title 'Normal Probability plot'; Gives the normal prob plot for Estimates.
symbol v=square h=1;
proc univariate data=estim noprint;
probplot Estimate/normal(mu=est sigma=est) ;
inset normal;
run;
proc plot data=resout; Some residual plots for Adequacy of model
plot resid*(predict x1 x2 x3 x4)="*"/vref=0;
run;
proc univariate data=resout noprint;
probplot resid/normal(mu=est sigma=est);
inset normal;
run;
Some typical questions:
(a) R-squared for the model.
(b) General Anova for the model.
(c) Parameter estimates with std err.
(d) Residual Analysis for Model Adequacy.
We will discuss LINEAR REGRESSION in a while.
2) Suppose you have a categorical response variable and the regressor variables
are categorical or continuous then you should use PROC GENOMD
Consider Example 7.3.1 from Dobson's book. Here we have number of patients killed
and the number of patients under treatment, we have only one independent variable
"dose", so we have to find a perfect model for regressing
numkill/totnum ~ dose..
(c) Parameter estimates with std err.
(d) Residual Analysis for Model Adequacy.
(e) Some Goodness of Fit Statistics.
data beetle;
input dose totnum numkill;
datalines;
1.6907 59 6
....
....
1.6907 59 6
;
data new1;
set beetle;
prop=numkill/totnum;
output;
run;
proc plot data=new1;
plot prop*dose;
run;
Suppose you have the data set as :
Sn deaths Population
1 y1 n1
2 y2 n2
- - -
- - -
and if ni >> yi then we consider poisson regression otherwise
we consider binomial regression model.
|
proc genmod data=new1;
model numkill/totnum=dose /type3 dist=binomial link=logit lrci wald waldci;
there are some options for the DIST and LINK functions
dist=bin link=logit/probit
dist=poisson link=log
dist=mult link=clogit/cprobit
contrast ;
estimate ;
lsmeans catvar/ diff cl; You can use this to get the estimates of indep var
confidence intervals for that, only for categorical vars.
output out=resout reschi=chiresid resdev=devresid stdreschi=stdchiresid p=predict;
run;
Note : If you specify the option PSCALE / DSCALE in the model
then you will have to adjust the standard errors & CI's on your
own , but if you do not specify anything in the model statement for
scale adjustment then the program automatically adjusts for dispersion
parameter. Usually the estimate of dispersion parameter is :
phi= Dev/df(dev)
If we have response as a categorical variable with more than two categories then we go for
Logistic regression in two cases, one is Nominal Logistic Regression and the other is
Ordinal Logistic Regression... In case of Ordinal Logistic Regression.. just use the above
code and dist=mult, link=clogit, this will suffice.
But if you have the Nominal respose variable with more than two categories :Example-8.3.1(Dobson)
you can also use PROC PROBIT instead of PROC LOGISTIC
proc logistic data = new1 order = internal;
class sex(ref='f') age(ref='1')/param=ref;
freq count; "count" is the number of obsn at different categories
of response variable "respcat".
model respcat (ref='1')= sex age/link = glogit alpha=0.01 expb clodds=wald;
output out=outdat reschi=Resid pred=Predicted;
run;
"expb" gives the Odds Ratio
proc print data=outdat;
run;
How to get ODD's Ratio..?
proc logistic data = school order = internal;
freq count;
model style = /link = glogit ;
ods output parameterestimates= odds;
run;
data odds1;
set odds;
odds = exp(estimate);
run;
proc print data = odds1;
var response estimate odds;
run;
if I would like to fit an ordinal logistic regression , then use the following
code ...
proc logistic data = new1;
class sex(ref='f') age(ref='3')/param=ref;
freq count;
model respcat = sex age/link = clogit alpha=0.05 expb clodds=wald;
run;