Python 批量提取 FASTA 文件序列 - 基于 ID 筛选
Python 批量提取 FASTA 文件序列 - 基于 ID 筛选
本指南将指导您如何使用 Python 代码从 FASTA 文件中根据 ID 批量提取序列。
步骤:
- 读取 FASTA 文件:
def read_fasta_file(file_path):
with open(file_path, 'r') as file:
content = file.read()
return content
- 分割 FASTA 文件内容:
def split_fasta_content(content):
sequences = content.split('>')
sequences = [seq.strip() for seq in sequences if seq.strip() != '']
return sequences
- 提取每个序列的 ID 和内容:
def extract_sequences(sequences):
extracted_sequences = {}
for seq in sequences:
seq_lines = seq.split('
')
seq_id = seq_lines[0]
seq_content = ''.join(seq_lines[1:])
extracted_sequences[seq_id] = seq_content
return extracted_sequences
- 根据 ID 列表提取序列:
def get_sequences_by_ids(extracted_sequences, id_list):
sequences = {}
for seq_id in id_list:
if seq_id in extracted_sequences:
sequences[seq_id] = extracted_sequences[seq_id]
return sequences
完整代码示例:
def read_fasta_file(file_path):
with open(file_path, 'r') as file:
content = file.read()
return content
def split_fasta_content(content):
sequences = content.split('>')
sequences = [seq.strip() for seq in sequences if seq.strip() != '']
return sequences
def extract_sequences(sequences):
extracted_sequences = {}
for seq in sequences:
seq_lines = seq.split('
')
seq_id = seq_lines[0]
seq_content = ''.join(seq_lines[1:])
extracted_sequences[seq_id] = seq_content
return extracted_sequences
def get_sequences_by_ids(extracted_sequences, id_list):
sequences = {}
for seq_id in id_list:
if seq_id in extracted_sequences:
sequences[seq_id] = extracted_sequences[seq_id]
return sequences
file_path = 'example.fasta'
# 替换为您的 FASTA 文件路径
id_list = ['seq1', 'seq3']
# 替换为要提取的序列 ID 列表
content = read_fasta_file(file_path)
sequences = split_fasta_content(content)
extracted_sequences = extract_sequences(sequences)
selected_sequences = get_sequences_by_ids(extracted_sequences, id_list)
for seq_id, seq_content in selected_sequences.items():
print('ID:', seq_id)
print('Sequence:', seq_content)
print('---')
注意:
- 将代码中的
example.fasta替换为您的 FASTA 文件路径。 - 将
id_list替换为您要提取的序列 ID 列表。
运行代码后,将打印出所提取序列的 ID 和序列内容。
原文地址: https://www.cveoy.top/t/topic/pcWA 著作权归作者所有。请勿转载和采集!