CodePlexProject Hosting for Open Source Software

We introduce a Gaussian graphical model for modeling gene regulatory networks. Our model consists of two layers of nodes. The upper layer nodes include the set of known/putative transcription factors (TFs) and a set of hidden variables. The hidden variables are used to model the unknown TFs and confounding factors. The lower layer represents the remaining genes that are not included in the upper layer. The nodes are connected via arcs from the upper layer nodes to the the lower layer nodes. The expression levels of a node (gene) in the lower layer are modelled as a linear function of the expression levels of the upper layer nodes---that is, known TFs and hidden factors.

We impose an L1 regularization penalty on the likelihood function for automatic feature selection. In gene regulatory networks, the number of TFs is much smaller than the number of transcribed genes, and most genes are regulated by a a small number of TFs. The matrix that describes the connections between the transcription factors and the regulated genes is expected to be sparse. Thus L1 regularization is a natural choice for this setting.

The package contains C++ source codes that learn the sparse model using training data and apply it testing data with two user input parameters: the number of hidden variables and the L1 penalty. Please download the source codes and refer to the readme file for instructions on how to use the program.

Last edited Jun 17, 2010 at 5:16 PM by xiang, version 4