Introduction
The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences. It was one of the first applications of dynamic programming to compare biological sequences.
In this article, we try to use Needleman-Wunsch algorithm to find the minimum mismatch between two different DNA sequences.
Definition of Mismatches
Here we have two different DNA sequences, i.e., S1
and S2
, where
S1 = "GCATGCU"
,
S2 = "GCTTAGCU"
.
One solution to match these two DNA sequences is listed as below:
S1* = "GCAT-GCU"
,
S2* = "GCTTAGCU"
,
in which -
is called as “gap”, i.e., in this position the DNA is supposed to be unknown.
Match: The two letters are the same, except both two letters are gapped.
Mismatch: The two letters are differential or both two letters are gapped.
By above definition, the solution we find for matching S1
and S2
has two mismatches — T
& A
and -
& A
.
Codes for finding minimum mismatches
Here I list the python code for finding minimum number of mismatches between two DNA sequences. The insights of codes below can be viewed by supplemental documents here.
1 | ''' |