Consider a pairwise alignment between Sequences A and B. If a
residue from Sequence A is aligned to
a gap character, it is sometimes useful to determine whether the resulting
alignment position is part
of an "internal" gap or part of a "terminal" gap. In this
case, the alignment position would
be part of an internal gap if there is at least one residue in Sequence
B that precedes it
in the alignment and there is at least one residue in Sequence B that
follows it
in the alignment. Otherwise, the alignment position is said to
be part of a terminal gap. For example,
the following alignment ...
AGT-CTTG--
---ACTAGGA
has 5 positions that are part of terminal gaps (3 positions on the 5' end
and
2 on the 3' end) and 1 position that is an internal gap.
Write a program that implements a dynamic programming algorithm for
global alignment of DNA sequence pairs.
The program should find the alignment with the optimal score, where
the score of an alignment is:
3*(# of alignment positions that are "internal" gaps) +
2*(# of alignment positions that are in "terminal" gaps on the 5' end) +
2*(# of alignment positions that are in "terminal" gaps on the 3' end) +
2*(# of alignment positions that are mismatches) +
0*(# of alignment positions that are matches).
For a given pair of sequences, there may be multiple alignments
that achieve the optimal score. The
program should print one of these optimal alignments. The program
should also report the optimal score.
The program should read sequence files that are in "FASTA"
format. With this format, the name of a sequence is on a line that
begins with the character ">".
On subsequent lines, the sequence
is listed. The sequence is assumed to end when the file ends or the
next ">" is encountered.
The sequence file that you should use as input when you hand in ouput of your file is here.
The homework should be emailed to me (thorne@ncsu.edu) before class on Wednesday February 1.
In the email, you should include: the computer code that you have written,
the command
that I can use to compile the program on the unix-type system (e.g.
Mac OSX, linux, unix), the command
that I can use to run the program on a unix-type system, and the output that
resulted.
(this page's address is http://statgen.ncsu.edu/thorne/bioinf2hwk1.html)