A Computational Genomics Approach to the
Identification of Gene Networks
Andreas Wagner
The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501, USA
Nucleic Acids Research
25(18): 3594-3604 (Sept 15, 1997)
Abstract
To delineate the astronomical number of possible interactions of all
genes in a genome is a task for which conventional experimental
techniques are ill-suited. Sorely needed are rapid and inexpensive
methods that identify candidates for interacting genes, candidates
that can be further investigated by experiment. Such a method is
introduced here for an important class of gene interactions, i.e.,
transcriptional regulation via transcription factors (TFs) that bind to
specific enhancer or silencer sites. The method addresses the
question: which of the genes in a genome are likely to be regulated
by one or more TFs with known DNA binding specificity? It takes
advantage of the fact that many TFs show cooperativity in
transcriptional activation which manifests itself in closely spaced TF
binding sites. Such `clusters' of binding sites are very unlikely to
occur by chance alone, as opposed to individual sites, which are
often abundant in the genome. Here, statistical information about
binding site clusters in the genome, is complemented by information
about (i) known biochemical functions of the TF, (ii) the structure
of its binding site, and (iii) function of the genes near the cluster, to
identify genes likely to be regulated by a given transcription factor.
Several applications are illustrated with the genome of
Saccharomyces cerevisiae, and four different DNA binding activities,
SBF, MBF, a sub-class of bHLH proteins and NBF. The technique
may aid in the discovery of interactions between genes of known
function, and the assignment of biological functions to putative open
reading frames.