Statistical features of human exons and their
flanking regions
M. Q. Zhang
Cold Spring Harbor Laboratory, PO Box 100, Cold Spring Harbor,
NY 11724, USA
Human Molecular Genetics, 7(5), 919-932 (May 1998)
Abstract
To facilitate gene finding and for the investigation of human
molecular genetics on a genome scale, we present a comprehensive
survey on various statistical features of human exons. We first show
that human exons with flanking genomic DNA sequences can be
classified into 12 mutually exclusive categories. This classification
could serve as a standard for future studies so that direct
comparisons of results can be made. A database for eight categories
(related to human genes in which coding regions are split by introns)
was built from GenBank release 87.0 and analyzed by a number of
methods to characterize statistical features of these sequences that
may serve as controls or regulatory signals for gene expression. The
statistical information compiled includes profiles of signals for
transcription, splicing and translation, various compositional
statistics and size distributions. Further analyses reveal novel
correlations and constraints among different splicing features across
an internal exon that are consistent with the Exon Definition model.
This information is fundamental for a quantitative view of human
gene organization, and should be invaluable for individual scientists
to design human molecular genetics experiments.