Gene Tree Explorer Introduction

You are about to start exploring a gene tree generated by a supervised learning algorithm. This tree organizes genes relevant to the separation between colon cancer tissues and normal tissues. The data used to generate the tree was obtained from DNA microarray experiments kindly provided by Alon et al (Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon cancer tissues probed by oligonucleotide arrays. Alon et al, PNAS vol. 96 pp. 6745-6750, June 1999, Cell Biology).
The algorithm is based on several calls of SVM RFE. One call of SVM RFE generates a gene ranking. The top ranking gene is best, as a singleton, to separate cancer vs. normal samples. The top two ranking genes form the best pair. The top n ranking genes from the best n-uplet. One call of SVM RFE provides nested subsets of complementary genes.
Genes are redundant so it is possible to find many subset of genes with similar performance. SVM RFE focuses on eliminating redundancy to provide the smallest possible subsets of complementary genes. For various reasons, it may also be interesting to also explore gene redundancy and obtain a more complete picture of all the genes that are relevant to the problem at hand: By calling several times SVM RFE on subsets of genes that do not include the top ranked genes of previous calls, we obtain alternative subsets that we organize in a tree. The tool provided in the left frame allows the user to explore that tree. At the start, we see all the single genes that provide a good separation cancer vs. normal, ranked from top to bottom in order of decreasing predictive power.
By clicking on an arrow, we expand a brach of the tree. We obtain alternate choices for a second gene. Together with its parent, the new gene forms a high predictive power gene pair. The top choice is always best as far as predictive power is concerned. By going deeper in the tree, we increase the size of the subset of complementary genes. Therefore, siblings are redundant and descendants are complementary.
By clicking on a Gene Accession Number, we obtain in the top righ frame information about the gene and its possible function and connection to the disease and the organ affected. Our Gene Search Assitant allows us to refine the information already present in our knowledge base, if necessary. We may decide to explore an alternate path in the tree if the gene found, in spite of its optimality with respect to predictive power, is not promising from the knowledge base information consulted.
Simultaneously, in the bottom right frame, we obtain information about the corresponding subset of genes obtained by walking from the root of the tree to the node selected.