Detecting Frame Shifts by Amino Acid Sequence Comparison
J.M. Claverie
Journal of Molecular Biology, 234(4):1140-1157 (Dec 20, 1993)
Abstract
Various amino acid substitution scoring matrices are used in
conjunction with local alignments programs to detect regions of
similarity and infer potential common ancestry between proteins. The
usual scoring schemes derive from the implicit hypothesis that
related proteins evolve from a common ancestor by the accumulation
of point mutations and that amino acids tend to be progressively
substituted by others with similar properties. However, other
frequent single mutation events, like nucleotide insertion or
deletion and gene inversion, change the translation reading frame
and cause previously encoded amino acid sequences to become
unrecognizable at once. Here, I derive five new types of scoring
matrix, each capable of detecting a specific frame shift (deletion,
insertion and inversion in 3 frames) and use them with a regular
local alignments program to detect amino acid sequences that may
have derived from alternative reading frames of the same nucleotide
sequence. Frame shifts are inferred from the sole comparison of the
protein sequences. The five scoring matrices were used with the
BLASTP program to compare all the protein sequences in the Swissprot
database. Surprisingly, the searches revealed hundreds of highly
significant frame shift matches, of which many are likely to
represent sequencing errors. Others provide some evidence that frame
shift mutations might be used in protein evolution as a way to
create new amino acid sequences from pre-existing coding regions.