Python 批量提取 FASTA 文件序列 - 根据 ID 筛选 - 常规

要根据文件 'A.fasta' 中提供的 ID 从 'B.fasta' 文件中批量提取序列，可以使用 Python 的 biopython 库来实现。

以下是一个示例代码：

from Bio import SeqIO

# 读取 'A.fasta' 文件中的 ID
ids = []
with open('A.fasta') as file:
    for line in file:
        if line.startswith('>'):
            ids.append(line.strip()[1:])  # 去除 ID 前面的 '>' 符号

# 从 'B.fasta' 文件中提取序列
sequences = []
with open('B.fasta') as file:
    for record in SeqIO.parse(file, 'fasta'):
        if record.id in ids:
            sequences.append(record)

# 将提取到的序列写入新的文件
SeqIO.write(sequences, 'output.fasta', 'fasta')

在上述代码中，首先从 'A.fasta' 文件中读取所有的 ID，并存储在一个列表中。然后，使用 SeqIO.parse 函数从 'B.fasta' 文件中逐个读取序列记录，如果记录的 ID 在提供的 ID 列表中，就将该记录添加到一个新的列表中。最后，使用 SeqIO.write 函数将提取到的序列写入一个新的 fasta 文件中 ('output.fasta')。

请根据实际情况修改文件名和路径。