Python 代码:药物-蛋白质交互网络数据预处理
以下 Python 代码用于对药物-蛋白质交互网络数据进行预处理,生成可用于机器学习模型训练的数据集。
from data.utils import load_data_torch, process_prot_edge
from src.utils import *
import pickle
et_list = [1]
out_file = './data/data_dict.pkl'
data = load_data_torch('./data/', et_list, mono=True)
# graph features
data['n_drug'] = data['d_feat'].shape[0]
data['n_prot'] = data['p_feat'].shape[0]
data['n_dd_et'] = len(et_list)
data['dd_train_idx'], data['dd_train_et'], data['dd_train_range'], data['dd_test_idx'], data['dd_test_et'], data['dd_test_range'] = process_edges(data['dd_edge_index'])
data['pp_train_indices'], data['pp_test_indices'] = process_prot_edge(data['pp_adj'])
# TODO: add drug feature
data['d_feat'] = sparse_id(data['n_drug'])
data['p_feat'] = sparse_id(data['n_prot'])
data['n_drug_feat'] = data['d_feat'].shape[1]
data['d_norm'] = torch.ones(data['n_drug_feat'])
# ###################################
# dp_edge_index and range index
# ###################################
data['dp_edge_index'] = np.array([data['dp_adj'].col-1, data['dp_adj'].row-1])
count_drug = np.zeros(data['n_drug'], dtype=np.int32)
for i in data['dp_edge_index'][1, :]:
count_drug[i] += 1
range_list = []
start = 0
end = 0
for i in count_drug:
end += i
range_list.append((start, end))
start = end
data['dp_edge_index'] = torch.from_numpy(data['dp_edge_index'] + np.array([[0], [data['n_prot']]]))
data['dp_range_list'] = torch.Tensor(range_list)
with open(out_file, 'wb') as f:
pickle.dump(data, f)
print('Data has been prepared and is ready to use --> ./data/data_dict.pkl')
运行代码前的准备工作:
- 确认您已经安装了依赖库,包括 torch、numpy、scipy、networkx 和 pickle。
- 下载数据集,并将数据文件夹放在代码所在目录下。
运行代码:
- 打开 Python 编辑器或终端,导入必要的库和函数。
- 将代码复制到编辑器或终端中,并运行。
- 程序会输出一条消息,提示数据已经准备好,保存在指定路径下。
注意:
这段代码可能需要较长时间才能运行完毕,具体时间取决于您的计算机性能和数据集大小。
原文地址: https://www.cveoy.top/t/topic/m1BL 著作权归作者所有。请勿转载和采集!