用python代码对一个文件夹下的所有fasta文件,提取fasta序列的部分子序列ID和序列,并将其合并在一个fasta序列文件,并以文件形式输出
以下是Python代码,可以读取一个文件夹下的所有fasta文件,提取fasta序列的部分子序列ID和序列,并将其合并在一个fasta序列文件,并以文件形式输出:
import os
input_folder = "/path/to/folder/with/fasta/files"
output_file = "/path/to/output/file.fasta"
# Open output file for writing
with open(output_file, "w") as out_fasta:
# Loop through all files in input folder
for filename in os.listdir(input_folder):
if filename.endswith(".fasta"):
# Open fasta file for reading
with open(os.path.join(input_folder, filename), "r") as fasta:
seq_id = ""
sequence = ""
# Loop through each line in fasta file
for line in fasta:
# If line starts with ">" (indicating a new sequence ID)
if line.startswith(">"):
# Write previous sequence to output file (if not first sequence)
if seq_id != "":
out_fasta.write(">" + seq_id + "\n" + sequence + "\n")
# Extract new sequence ID and reset sequence variable
seq_id = line.strip().lstrip(">")
sequence = ""
else:
# Append sequence line to sequence variable
sequence += line.strip()
# Write last sequence to output file
out_fasta.write(">" + seq_id + "\n" + sequence + "\n")
在以上代码中,input_folder变量应该是包含fasta文件的文件夹的路径。output_file变量是要创建的输出fasta文件的路径和文件名。在循环中,代码读取每个fasta文件并提取每个序列的ID和序列。然后将它们写入输出文件中。最后,代码将合并所有fasta文件中的所有序列,并创建一个包含所有序列的新fasta文件。
原文地址: https://www.cveoy.top/t/topic/yAj 著作权归作者所有。请勿转载和采集!