Software library GeneLogit for building a logistic classification model using microarray data that runs on R (www.r-project.org)

Jason Liao

Drexel University School of Public Health

[email protected]

This is the software for “Liao and Chin (2007) Logistic regression for disease classification using microarray data: model selection in a large p and small n case “. It can be used to build a logistic prediction model using microarray gene expression data. See the paper for details.

The library GeneLogit is contained in file http://www.geocities.com/jg_liao/software/GeneLogit.txt. The sample R program for analyzing Golub's leukemia dataset using the functions in GeneLogit library is in file http://www.geocities.com/jg_liao/software/Analyze.txt, which you can modify to suit your dataset and analysis. The Golub’s training dataset is in golub_train.dta and the test dataset is golub_test.dta (both standardized so that the variance of each gene across subjects is 1). This free site, however, does not allow me to host the files due to space limitation. Please send me an email and I will get them to you usually within a few hours.

You need a top of the line computer (dual core with at least one Gibb Bytes of RAM) to run the program. It is important that a separate directory be used for each dataset as the R program writes some intermediate files with some fixed file names.

 

 

 

 

setstats1

Hosted by www.Geocities.ws

1