Las Vegas Algorithms for Gene Recognition:
Suboptimal and Error-Tolerant Spliced Alignment
S.-H.Sze and P.A. Pevzner
Journal of Computational Biology, 4(3):297-310 (Fall 1997)
Abstract
Recently, Gelfand, Mironov and Pevzner (1996) proposed a spliced
alignment approach to gene recognition that provides 99% accurate
recognition of human genes if a related mammalian protein is available.
However, even 99% accurate gene predictions are insufficient for
automated sequence annotation in large-scale sequencing projects and
therefore have to be complemented by experimental gene verification.
One hundred percent accurate predictions would lead to a
substantial reduction of experimental work on gene identification.
Our goal is to develop an algorithm that either predicts an exon assembly
with accuracy sufficient for sequence annotation or warns a biologist
that the accuracy of a prediction is insufficient and further experimental
work is required. We study suboptimal and error-tolerant spliced
alignment problems as the first steps towards such an algorithm, and report
an algorithm which provides 100% accurate recognition of human
genes in 37% of cases (if a related mammalian protein is available).
In 52% of genes, the algorithm predicts at least one exon with 100% accuracy.