以下 Python 代码用于对药物-蛋白质交互网络数据进行预处理,生成可用于机器学习模型训练的数据集。

from data.utils import load_data_torch, process_prot_edge
from src.utils import *
import pickle

et_list = [1]

out_file = './data/data_dict.pkl'

data = load_data_torch('./data/', et_list, mono=True)

# graph features
data['n_drug'] = data['d_feat'].shape[0]
data['n_prot'] = data['p_feat'].shape[0]
data['n_dd_et'] = len(et_list)

data['dd_train_idx'], data['dd_train_et'], data['dd_train_range'], data['dd_test_idx'], data['dd_test_et'], data['dd_test_range'] = process_edges(data['dd_edge_index'])
data['pp_train_indices'], data['pp_test_indices'] = process_prot_edge(data['pp_adj'])

# TODO: add drug feature
data['d_feat'] = sparse_id(data['n_drug'])
data['p_feat'] = sparse_id(data['n_prot'])
data['n_drug_feat'] = data['d_feat'].shape[1]
data['d_norm'] = torch.ones(data['n_drug_feat'])

# ###################################
# dp_edge_index and range index
# ###################################
data['dp_edge_index'] = np.array([data['dp_adj'].col-1, data['dp_adj'].row-1])

count_drug = np.zeros(data['n_drug'], dtype=np.int32)
for i in data['dp_edge_index'][1, :]:
    count_drug[i] += 1
range_list = []
start = 0
end = 0
for i in count_drug:
    end += i
    range_list.append((start, end))
    start = end

data['dp_edge_index'] = torch.from_numpy(data['dp_edge_index'] + np.array([[0], [data['n_prot']]]))
data['dp_range_list'] = torch.Tensor(range_list)

with open(out_file, 'wb') as f:
    pickle.dump(data, f)

print('Data has been prepared and is ready to use --> ./data/data_dict.pkl')

运行代码前的准备工作:

  1. 确认您已经安装了依赖库,包括 torch、numpy、scipy、networkx 和 pickle。
  2. 下载数据集,并将数据文件夹放在代码所在目录下。

运行代码:

  1. 打开 Python 编辑器或终端,导入必要的库和函数。
  2. 将代码复制到编辑器或终端中,并运行。
  3. 程序会输出一条消息,提示数据已经准备好,保存在指定路径下。

注意:

这段代码可能需要较长时间才能运行完毕,具体时间取决于您的计算机性能和数据集大小。

Python 代码:药物-蛋白质交互网络数据预处理

原文地址: https://www.cveoy.top/t/topic/m1BL 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录