Train-Test Split in Python: sklearn.model_selection.train_test_split

The 'train_test_split' function in the sklearn.model_selection module is used to split the dataset into training and testing sets. It randomly divides the data into two subsets based on the specified test size or train size.

Here is an example usage of the 'train_test_split' function:

from sklearn.model_selection import train_test_split

# Splitting the dataset into features (X) and labels (y)
X = dataset.iloc[:, :-1]  # Features
y = dataset.iloc[:, -1]   # Labels

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Print the shapes of the training and testing sets
print('Training set shape:', X_train.shape, y_train.shape)
print('Testing set shape:', X_test.shape, y_test.shape)

In this example, 'X' represents the features of the dataset and 'y' represents the labels. The 'test_size' parameter specifies the proportion of the dataset that should be allocated to the testing set. The 'random_state' parameter ensures reproducibility of the results.

The 'train_test_split' function returns four arrays: 'X_train', 'X_test', 'y_train', and 'y_test', which represent the training and testing sets for both features and labels.

You can then use these split sets to train your machine learning model on the training set and evaluate its performance on the testing set.

Train-Test Split in Python: sklearn.model_selection.train_test_split