Abstract:
Consensus string is the most frequent common
pattern in a set of string. Consensus string is an important
feature of DNA sequence. Many algorithm have been introduced
to discover consensus string. Among them, median string
algorithm is the most popular one. Basically, that is a brute force
algorithm.DNA sequence is composed of a series of four letter
alphabet Σ={a,c,g,t}. If the size of the consensus string is l, then
the algorithm generates all the 4l number of l length strings
called motifs or l-mer. Then try to fit the motifs one by one with
the sequence. In this paper we have discovered a way to reduce
the search space using chapman kolmogorov relation. We found
that, the proposed system can find the same consensus string
within a shorter period of time than the time taken by the median
string algorithm. As the l-mer size increases, the proposed system
takes much less time than the median string algorithm. For l-mer
size 7, we found the proposed system is 47 times faster than the
median string algorithm.