Pedersen C.N.S. — Algorithms in Computational Biology :: Электронная библиотека попечительского совета мехмата МГУ

In this thesis we are concerned with constructing algorithms that address problems
of biological relevance. This activity is part of a broader interdisciplinary
area called computational biology, or bioinformatics, that focuses on utilizing
the capacities of computers to gain knowledge from biological data. The
majority of problems in computational biology relate to molecular or evolutionary
biology, and focus on analyzing and comparing the genetic material of
organisms. One deciding factor in shaping the area of computational biology
is that DNA, RNA and proteins that are responsible for storing and utilizing
the genetic material in an organism, can be described as strings over nite alphabets.
The string representation of biomolecules allows for a wide range of
algorithmic techniques concerned with strings to be applied for analyzing and
comparing biological data. We contribute to the eld of computational biology
by constructing and analyzing algorithms that address problems of relevance to
biological sequence analysis and structure prediction.
The genetic material of organisms evolves by discrete mutations, most prominently
substitutions, insertions and deletions of nucleotides. Since the genetic
material is stored in DNA sequences and reflected in RNA and protein sequences,
it makes sense to compare two or more biological sequences to look
for similarities and dierences that can be used to infer the relatedness of the
sequences. In the thesis we consider the problem of comparing two sequences
of coding DNA when the relationship between DNA and proteins is taken into
account. We do this by using a model that penalizes an event on the DNA by
the change it induces on the encoded protein. We analyze the model in detail,
and construct an alignment algorithm that improves on the existing best
alignment algorithm in the model by reducing its running time by a quadratic
factor. This makes the running time of our alignment algorithm equal to the
running time of alignment algorithms based on much simpler models.