Shashank Srivastava and Snigdha Chaturvedi
Advisor: Dr. Arnab Bhattacharya, IIT Kanpur
Recently, there has been much interest in the study and analysis of microarrays for mining valuable biological information, building predictive models for pathological conditions, and unraveling latent correlations signifying biological pathways. Several techniques have focused on identifying di.erentially expressed genes, and proposed representations of the microarrays through dimensionality reduction techniques to overcome the `curse of dimensionality'. Statistical tests such as Anova and the Fisher's test, and methods such as clustering and SVD decomposition have been useful in determining di.erentially regulated genes, and building statistical models of medical conditions from gene-expression values while ignoring noise and normal variations. Recently, the Gene-set approach has been proposed to evaluate expression patterns of gene groups instead of individual genes. These methods, however, do not allow direct inference of relations between gene-expression values and genetic concepts. This study provides an approach to extend the statistical method using additional knowledge infusion from the structure of the Gene Ontology database. We propose a hierarchical approach towards data representation, which bypasses several limitations of existing methods, and can directly yield biological understanding and interpretability. The proposed method is tested on two standard datasets. Prognostic predictions from our method are seen to validated precisely by existing biological literature and highly specific control-studies. The proposed representation also shows predictive potential, and classi.cation accuracies from our novel representation scheme using decision trees compare favorably with statistical methods.