根据给出的数据,每一行表示一个类似AT1G01260的数据,每一行的第一个数值是类似的标识符,后面的四个数值是该类似的概率矩阵。因此,可以通过将数据分割成每一行,并提取第一个元素和后面的四个元素来获取每一个类似AT1G01260和概率矩阵的数据。

以下是使用Python代码实现的方法:

data = '''MOTIF AT1G01260 MP00100
letter-probability matrix: alength= 4 w= 8 nsites= 1000 E= 0
0.295000 0.116000 0.569000 0.020000
0.053000 0.945000 0.001000 0.001000
0.991009 0.001998 0.002997 0.003996
0.001000 0.952000 0.001000 0.046000
0.015015 0.001001 0.982983 0.001001
0.004000 0.004000 0.001000 0.991000
0.001000 0.002000 0.987000 0.010000
0.038000 0.696000 0.065000 0.201000'''

lines = data.split('\n')
for line in lines:
    if line.startswith('MOTIF'):
        motif = line.split()[1]
    elif line.startswith('letter-probability matrix'):
        continue
    else:
        probabilities = line.split()
        print(motif, probabilities)

输出结果如下:

AT1G01260 ['0.295000', '0.116000', '0.569000', '0.020000']
AT1G01260 ['0.053000', '0.945000', '0.001000', '0.001000']
AT1G01260 ['0.991009', '0.001998', '0.002997', '0.003996']
AT1G01260 ['0.001000', '0.952000', '0.001000', '0.046000']
AT1G01260 ['0.015015', '0.001001', '0.982983', '0.001001']
AT1G01260 ['0.004000', '0.004000', '0.001000', '0.991000']
AT1G01260 ['0.001000', '0.002000', '0.987000', '0.010000']
AT1G01260 ['0.038000', '0.696000', '0.065000', '0.201000']

这样,以类似AT1G01260的标识符和对应的概率矩阵数据一一对应地提取出来了

MOTIF AT1G01260 MP00100letter-probability matrix alength= 4 w= 8 nsites= 1000 E= 00295000 0116000 0569000 00200000053000 0945000 0001000 00010000991009 0001998 0002997 00039960001000 0952000 0001000 0

原文地址: https://www.cveoy.top/t/topic/hDbJ 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录