我有一个序列列表,可在测序数据中找到。因此,我运行了一个for循环以查找数据集中的匹配序列,并使用Counter()获得最大序列。但是我发现Counter()函数会添加以前的循环数据,而不是单独的循环数据。
ls = ['AGC','GCT','TAC','CGT']
dataset.txt就像一串序列的“ AGTAGCTTT”,“ AGTTAGC” ......
def xfind(seq):
ls2 = []
with open(dataset.txt,'r') as f:
for line in f:
if seq in line:
ls2.append(line)
import collections
from collections import Counter
cnt = Counter()
for l in ls2:
cnt[l] += 1
print (cnt.most_common()[0])
for l2 in ls:
xfind(l2)
结果如下: (“ AGTAGCTTT”,2) (“ AGTAGCTTT”,5)
应为: (“ AGTAGCTTT”,2) (“ GCT ...”,3)