Phi-OTDR Event Classification Dataset and Baseline Models
This dataset contains six types of Phi-OTDR events, including background noises (3094 samples, Fig (a)), digging (2512 samples, Fig (b)), knocking (2530 samples, Fig (c)), shaking (2298, Fig (d)), watering (2728, Fig (e)) and walking (2450, Fig (f)), in a total of 15,612 samples. And the typical differentiated samples (size: 12(space)*9999(time)) are demonstrated in the figure.
Fig. Time-space figure of typical samples of different events
To ensure the robustness of the dataset, two segments of fibers (5.1 km and 10.1 km) are used for collecting the above mentioned events at their tail parts (from 5.0 to 5.05 km and from 10.0 to 10.05 km) by ten members of our research team at different time. In order to facilitate subsequent data processing, we clip the collected data into 12*9999 samples. The data is divided into training set and test set with a ratio of 8:2, and the detailed number of events is displayed in the readme file of the dataset. The dataset also contains label files.
Since GitHub has a data upload size limit, we have uploaded the data to Google Drive and Baidu Netdisk (link in the rawdata file).
We also publicize codes for two common baseline models, which are the SVM (support vector machine, 1D method) and CNN (convolutional neural network, 2D approach) models. The files, das_data_svm.py, get_das_data.py, and feature_extraction.py are for the SVM Model, while das_data_cnn.py, models.py, amd mydataset.py are for the CNN. An extra feature_visualization.py file is used to directly observe the event features' distinguishability.
You are welcome to use our codes and dataset for non-commercial scientific research proposes, but please do mention the their origin (our paper and Github). For commercial applications, please contact us.
See more details [1].
[1]. Cao, X., Su, Y., Jin, Z., & Yu, K. (2023). An open dataset of φ-OTDR events with two classification models as baselines. Results in Optics, 100372.
First Online Date: 22:00 Beijing Time, Jun. 2nd, 2022
----------update: Sept-13-2022--------------
import numpy as np import os import scipy.io as scio from torch.utils.data import Dataset
def normalize(data): # 归一化到0-255 rawdata_max = max(map(max, data)) rawdata_min = min(map(min, data)) for i in range(data.shape[0]): for j in range(data.shape[1]): data[i][j] = round(((255 - 0) * (data[i][j] - rawdata_min) / (rawdata_max - rawdata_min)) + 0) return data
class MyDataset(Dataset):
def __init__(self, root_dir, names_file, transform=None):
self.root_dir = root_dir
self.names_file = names_file
self.transform = transform
self.size = 0
self.names_list = []
if not os.path.isfile(self.names_file):
print(self.names_file + 'does not exist!')
file = open(self.names_file)
for f in file:
self.names_list.append(f)
self.size += 1
def __len__(self):
return self.size
def __getitem__(self, idx):
data_path = self.root_dir + self.names_list[idx].split(' ')[0]
if not os.path.isfile(data_path):
print(data_path + 'does not exist!')
return None
rawdata = scio.loadmat(data_path)['data'] # 10000,12 uint16
rawdata = rawdata.astype(int) # int32
data = normalize(rawdata)
label = int(self.names_list[idx].split(' ')[1])
sample = {'data': data, 'label': label}
if self.transform:
sample = self.transform(sample)
return sample
原文地址: https://www.cveoy.top/t/topic/fPIj 著作权归作者所有。请勿转载和采集!