Predicting Internal Exons by Oligonucleotide Composition and
Discriminant Analysis of Spliceable Open Reading Frames.
V.V. Solovyev, A.A. Salamov, C.B. Lawrence CB
Nucleic Acids Research, 22(24):5156-63 (Dec 1994)
Abstract
A new method which predicts internal exon sequences in human DNA has
been developed. The method is based on a splice site prediction
algorithm that uses the linear discriminant function to combine
information about significant triplet frequencies of various
functional parts of splice site regions and preferences of
oligonucleotides in protein coding and intron regions. The accuracy
of our splice site recognition function is 97% for donor splice
sites and 96% for acceptor splice sites. For exon prediction, we
combine in a discriminant function the characteristics describing
the 5'-intron region, donor splice site, coding region, acceptor
splice site and 3'-intron region for each open reading frame flanked
by GT and AG base pairs. The accuracy of precise internal exon
recognition on a test set of 451 exon and 246693 pseudoexon
sequences is 77% with a specificity of 79%. The recognition quality
computed at the level of individual nucleotides is 89% for exon
sequences and 98% for intron sequences. This corresponds to a
correlation coefficient for exon prediction of 0.87. The precision
of this approach is better than other methods and has been tested on
a larger data set. We have also developed a means for predicting
exon-exon junctions in cDNA sequences, which can be useful for
selecting optimal PCR primers.