最简单的方法是修剪或填充一组biopython fastfa文件,直到它们都具有一定的长度,以便我可以将它们添加到多序列比对中? BioPython AlignIO ValueError says strings must be same length?与此处的答案类似,不同之处在于它具有多个序列,没有文本文件,最后应将其全部合并到一个多序列的声明中。最终目标是使所有序列均为570个字符。我打算将所有这些都整合到一棵门上
hhhhttty 回答:如何修剪或填充序列以达到一定长度bio python
我对Biopython不熟悉,但是我知道您可以在pysam中轻松地进行操作,方法是读取FASTA,循环遍历每个序列,将序列修整为特定大小,然后将其写入新的FASTA。请参见下面的示例:
from pysam import FastxFile
fasta_q_file = "INPUT.fasta"
out_filename = "OUTPUT_NAME.fasta"
size_size_trim = 50
with FastxFile(fasta_q_file) as fh,open(out_filename,mode='w') as fout:
for entry in fh:
sequence_id = entry.name
sequence = entry.sequence
if sequence > size_size_trim:
fout.write(">{}_trimmed_to_{}_bp\n{}\n".format(size_size_trim,sequence_id,sequence[:size_size_trim]))
else:
if sequence == size_size_trim:
fout.write(">{}\n{}\n".format(sequence_id,sequence[:size_size_trim]))
else:
# sequences shorter than `size_size_trim` are not written.
continue