Chapman-Kolmogorov Relation Based Median String Algorithm for DNA Consensus Classification

Kaysar, Mohammad Shibli; Khan, Mohammad Ibrahim

Chapman-Kolmogorov Relation Based Median String Algorithm for DNA Consensus Classification

Kaysar, Mohammad Shibli; Khan, Mohammad Ibrahim

URI: http://103.99.128.19:8080/xmlui/handle/123456789/331

Date: 2019-05-03

Abstract:

Consensus string is the most frequent common pattern in a set of string. Consensus string is an important feature of DNA sequence. Many algorithm have been introduced to discover consensus string. Among them, median string algorithm is the most popular one. Basically, that is a brute force algorithm.DNA sequence is composed of a series of four letter alphabet Σ={a,c,g,t}. If the size of the consensus string is l, then the algorithm generates all the 4l number of l length strings called motifs or l-mer. Then try to fit the motifs one by one with the sequence. In this paper we have discovered a way to reduce the search space using chapman kolmogorov relation. We found that, the proposed system can find the same consensus string within a shorter period of time than the time taken by the median string algorithm. As the l-mer size increases, the proposed system takes much less time than the median string algorithm. For l-mer size 7, we found the proposed system is 47 times faster than the median string algorithm.

Show full item record