Identification of protein coding regions in the human genome by quadratic discriminant analysis

M. Q. Zhang

Cold Spring Harbor Laboratory, P.O. Box 100, Cold Spring Harbor, NY 11724

Proceedings of National Academy of Science (USA), 94: 559-564 (1997).

Abstract

A new method for predicting internal coding exons in genomic DNA sequences has been developed. This method is based on a prediction algorithm that uses the quadratic discriminant function for multivariate statistical pattern recognition. Substantial improvements have been made (with only 9 discriminant variables) when compared with existing methods: HEXON [Solovyev, V. V., Salamov, A. A. & Lawrence, C. B. (1994) Nucleic Acids Res. 22, 5156-5163] (based on linear discriminant analysis) and GRAIL2 [Uberbacher, E. C. & Mural, R. J. (1991) Proc. Natl. Acad. Sci. USA 88, 11261-11265] (based on neural networks). A computer program called MZEF is freely available to the genome community and allows users to adjust prior probability and to output alternative overlapping exons.