Gene Tree Explorer Introduction
You are about to start exploring a gene tree generated by a supervised learning algorithm.
This tree organizes genes relevant to the separation between colon cancer tissues
and normal tissues. The data used to generate the tree was obtained from
DNA microarray experiments kindly provided by Alon et al (Broad patterns of gene expression
revealed by clustering analysis of tumor and normal colon cancer tissues probed by
oligonucleotide arrays. Alon et al, PNAS vol. 96 pp. 6745-6750, June 1999, Cell Biology).
The algorithm is based on several calls of SVM RFE. One call of SVM RFE
generates a gene ranking. The top ranking gene is best, as a singleton,
to separate cancer vs. normal samples. The top two
ranking genes form the best pair. The top n ranking genes from the best n-uplet.
One call of SVM RFE provides nested subsets of complementary genes.
Genes are redundant so it is possible to find many subset of genes with similar performance.
SVM RFE focuses on eliminating redundancy to provide the smallest possible subsets of
complementary genes. For various reasons, it may also be interesting to also explore
gene redundancy and obtain a more complete picture of all the genes that are relevant
to the problem at hand:
- Methodological reasons: verifying the plausibility of the genes selected, in relation
to the problem at hand, in order to catch errors in sample preparation, data handling and algorithms;
- Scientific reasons: understanding relationships between co-expressed genes;
- Practical reasons: selecting genes corresponding to proteins that are stable
or easy to measure in serum;
- Business reasons: selecting genes that are not yet patented.
By calling several times SVM RFE on subsets of genes that do not include the
top ranked genes of previous calls, we obtain alternative subsets that we
organize in a tree. The tool provided in the left frame allows the user
to explore that tree. At the start, we see all the single genes that provide a
good separation cancer vs. normal, ranked from top to bottom in order of
decreasing predictive power.
By clicking on an arrow, we expand a brach of the tree. We obtain alternate choices
for a second gene. Together with its parent, the new gene forms a high predictive
power gene pair. The top choice
is always best as far as predictive power is concerned. By going deeper in the tree,
we increase the size of the subset of complementary genes. Therefore, siblings are
redundant and descendants are complementary.
By clicking on a Gene Accession Number, we obtain in the top righ frame
information about the gene and
its possible function and connection to the disease and the organ affected. Our Gene
Search Assitant allows us to refine the information already present in our knowledge
base, if necessary. We may decide to explore an alternate path in the tree if the gene
found, in spite of its optimality with respect to predictive power, is not promising
from the knowledge base information consulted.
Simultaneously, in the bottom right frame, we obtain information about the corresponding
subset of genes obtained by walking from the root of the tree to the node selected.