Phi-OTDR Event Classification Dataset and Baseline Models

This dataset contains six types of Phi-OTDR events, including background noises (3094 samples, Fig (a)), digging (2512 samples, Fig (b)), knocking (2530 samples, Fig (c)), shaking (2298, Fig (d)), watering (2728, Fig (e)) and walking (2450, Fig (f)), in a total of 15,612 samples. And the typical differentiated samples (size: 12(space)*9999(time)) are demonstrated in the figure.

text Fig. Time-space figure of typical samples of different events

To ensure the robustness of the dataset, two segments of fibers (5.1 km and 10.1 km) are used for collecting the above mentioned events at their tail parts (from 5.0 to 5.05 km and from 10.0 to 10.05 km) by ten members of our research team at different time. In order to facilitate subsequent data processing, we clip the collected data into 12*9999 samples. The data is divided into training set and test set with a ratio of 8:2, and the detailed number of events is displayed in the readme file of the dataset. The dataset also contains label files.

Since GitHub has a data upload size limit, we have uploaded the data to Google Drive and Baidu Netdisk (link in the rawdata file).

We also publicize codes for two common baseline models, which are the SVM (support vector machine, 1D method) and CNN (convolutional neural network, 2D approach) models. The files, das_data_svm.py, get_das_data.py, and feature_extraction.py are for the SVM Model, while das_data_cnn.py, models.py, amd mydataset.py are for the CNN. An extra feature_visualization.py file is used to directly observe the event features' distinguishability.

You are welcome to use our codes and dataset for non-commercial scientific research proposes, but please do mention the their origin (our paper and Github). For commercial applications, please contact us.

See more details [1].

[1]. Cao, X., Su, Y., Jin, Z., & Yu, K. (2023). An open dataset of φ-OTDR events with two classification models as baselines. Results in Optics, 100372.

First Online Date: 22:00 Beijing Time, Jun. 2nd, 2022

----------update: Sept-13-2022--------------

import numpy as np import os import scipy.io as scio from torch.utils.data import Dataset

def normalize(data): # 归一化到0-255 rawdata_max = max(map(max, data)) rawdata_min = min(map(min, data)) for i in range(data.shape[0]): for j in range(data.shape[1]): data[i][j] = round(((255 - 0) * (data[i][j] - rawdata_min) / (rawdata_max - rawdata_min)) + 0) return data

class MyDataset(Dataset):

def __init__(self, root_dir, names_file, transform=None):
    self.root_dir = root_dir
    self.names_file = names_file
    self.transform = transform
    self.size = 0
    self.names_list = []
    if not os.path.isfile(self.names_file):
        print(self.names_file + 'does not exist!')
    file = open(self.names_file)
    for f in file:
        self.names_list.append(f)
        self.size += 1

def __len__(self):
    return self.size

def __getitem__(self, idx):
    data_path = self.root_dir + self.names_list[idx].split(' ')[0]
    if not os.path.isfile(data_path):
        print(data_path + 'does not exist!')
        return None
    rawdata = scio.loadmat(data_path)['data']  # 10000,12 uint16
    rawdata = rawdata.astype(int)  # int32
    data = normalize(rawdata)
    label = int(self.names_list[idx].split(' ')[1])
    sample = {'data': data, 'label': label}
    if self.transform:
        sample = self.transform(sample)
    return sample

Phi-OTDR Event Classification Dataset and Baseline Models