Amex Credit Default Prediction: 5-Fold Cross-Validation Model Training
Amex Credit Default Prediction: 5-Fold Cross-Validation Model Training
This code block demonstrates a 5-fold cross-validation approach for training a model to predict Amex credit default. It assumes the following:
TRAIN_MODELis a boolean flag indicating whether to train the model or load pre-trained weights.PATH_TO_DATApoints to the directory containing the training data (data_{k}.npy and targets_{k}.pqt files for each fold).PATH_TO_MODELindicates the location to save the trained model weights.build_model()is a function that defines the model architecture.amex_metric_mod()is a function to calculate the Amex metric.LRis a learning rate scheduler or other callback.
Workflow:
- Training (if
TRAIN_MODELis True):- For each fold (5 total):
- Read training and validation data from disk.
- Build and train the model using
build_model(). - Predict on the validation set.
- Calculate fold-specific CV score using
amex_metric_mod(). - Concatenate true and oof predictions for overall CV calculation.
- For each fold (5 total):
- Overall CV Score:
- After training all folds, calculate and print the overall CV score using
amex_metric_mod()on the concatenated true and oof predictions.
- After training all folds, calculate and print the overall CV score using
- Clean Up:
- Clear the session (
K.clear_session()) and garbage collect to free memory.
- Clear the session (
Code:
if TRAIN_MODEL:
# SAVE TRUE AND OOF
true = np.array([])
oof = np.array([])
VERBOSE = 2 # use 1 for interactive
for fold in range(5):
# INDICES OF TRAIN AND VALID FOLDS
valid_idx = [2*fold+1, 2*fold+2]
train_idx = [x for x in [1,2,3,4,5,6,7,8,9,10] if x not in valid_idx]
print('#'*25)
print(f'### Fold {fold+1} with valid files', valid_idx)
# READ TRAIN DATA FROM DISK
X_train = []; y_train = []
for k in train_idx:
X_train.append( np.load(f'{PATH_TO_DATA}data_{k}.npy'))
y_train.append( pd.read_parquet(f'{PATH_TO_DATA}targets_{k}.pqt') )
X_train = np.concatenate(X_train,axis=0)
y_train = pd.concat(y_train).target.values
print('### Training data shapes', X_train.shape, y_train.shape)
# READ VALID DATA FROM DISK
X_valid = []; y_valid = []
for k in valid_idx:
X_valid.append( np.load(f'{PATH_TO_DATA}data_{k}.npy'))
y_valid.append( pd.read_parquet(f'{PATH_TO_DATA}targets_{k}.pqt') )
X_valid = np.concatenate(X_valid,axis=0)
y_valid = pd.concat(y_valid).target.values
print('### Validation data shapes', X_valid.shape, y_valid.shape)
print('#'*25)
# BUILD AND TRAIN MODEL
K.clear_session()
model = build_model()
h = model.fit(X_train,y_train,
validation_data = (X_valid,y_valid),
batch_size=512, epochs=8, verbose=VERBOSE,
callbacks = [LR])
if not os.path.exists(PATH_TO_MODEL): os.makedirs(PATH_TO_MODEL)
model.save_weights(f'{PATH_TO_MODEL}gru_fold_{fold+1}.h5')
# INFER VALID DATA
print('Inferring validation data...')
p = model.predict(X_valid, batch_size=512, verbose=VERBOSE).flatten()
print()
print(f'Fold {fold+1} CV=', amex_metric_mod(y_valid, p) )
print()
true = np.concatenate([true, y_valid])
oof = np.concatenate([oof, p])
# CLEAN MEMORY
del model, X_train, y_train, X_valid, y_valid, p
gc.collect()
# PRINT OVERALL RESULTS
print('#'*25)
print(f'Overall CV =', amex_metric_mod(true, oof) )
K.clear_session()
This code provides a robust and optimized framework for training a model for Amex credit default prediction using 5-fold cross-validation. It incorporates essential elements like memory management, clear output, and flexibility for further customization.
原文地址: http://www.cveoy.top/t/topic/oFRj 著作权归作者所有。请勿转载和采集!