Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the occurrence, location and order relation of k-tuples in DNA sequence. This method is compared with BlastClust, CD-HIT-EST etc. The experimental results show our method is effective in classifying DNA sequences with similar biological characteristics and in discovering the underlying relationship among the sequences.
Copyright © 2017 High Performance Computing Center, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences.
Designed by Chunxia Zeng. Dec 15 2017.