TrainDatasetGenerator Class: Python Code for Image Data Loading and Preprocessing

This is a Python class named 'TrainDatasetGenerator' designed to create a training dataset for machine learning models. The class takes a directory path as input and loads image files from that directory. It then preprocesses the images by converting them to floating-point numbers and normalizing them by dividing by 255. The class also extracts labels from the filenames and returns each image along with its label as a tuple. Here's how the class works:

class TrainDatasetGenerator:
    def __init__(self, file_path):
        self.file_path = file_path
        self.img_names = os.listdir(file_path)

    def __getitem__(self, index):
        data = cv2.imread(os.path.join(self.file_path, self.img_names[index]))
        label = int(self.img_names[index][0])-1
        #label = np.array([label])
        data = data.transpose().astype(np.float32) / 255.
        #data = np.expand_dims(data, axis=0)
        #data = Tensor(data)
        #label = Tensor(label)
        return data, label

    def __len__(self):
        return len(self.img_names)

Key features:

__init__: Initializes the class with the file path to the image directory.
__getitem__: Retrieves an image and its label based on the given index.
Data Preprocessing: Converts image data to floating-point numbers and normalizes them.
Label Extraction: Extracts labels from filenames.
__len__: Returns the total number of images in the dataset.

This code snippet is specifically designed to be used with deep learning frameworks such as PyTorch or TensorFlow. The commented lines in the __getitem__ method indicate where you would potentially integrate Tensor operations to further process the data and labels for use with these frameworks.