Amex Credit Default Prediction: 5-Fold Cross-Validation Model Training

This code block demonstrates a 5-fold cross-validation approach for training a model to predict Amex credit default. It assumes the following:

TRAIN_MODEL is a boolean flag indicating whether to train the model or load pre-trained weights.
PATH_TO_DATA points to the directory containing the training data (data_{k}.npy and targets_{k}.pqt files for each fold).
PATH_TO_MODEL indicates the location to save the trained model weights.
build_model() is a function that defines the model architecture.
amex_metric_mod() is a function to calculate the Amex metric.
LR is a learning rate scheduler or other callback.

Workflow:

Training (if TRAIN_MODEL is True):
- For each fold (5 total):
  - Read training and validation data from disk.
  - Build and train the model using build_model().
  - Predict on the validation set.
  - Calculate fold-specific CV score using amex_metric_mod().
  - Concatenate true and oof predictions for overall CV calculation.
Overall CV Score:
- After training all folds, calculate and print the overall CV score using amex_metric_mod() on the concatenated true and oof predictions.
Clean Up:
- Clear the session (K.clear_session()) and garbage collect to free memory.

Code:

if TRAIN_MODEL:
    # SAVE TRUE AND OOF
    true = np.array([])
    oof = np.array([])
    VERBOSE = 2 # use 1 for interactive 

    for fold in range(5):

        # INDICES OF TRAIN AND VALID FOLDS
        valid_idx = [2*fold+1, 2*fold+2]
        train_idx = [x for x in [1,2,3,4,5,6,7,8,9,10] if x not in valid_idx]

        print('#'*25)
        print(f'### Fold {fold+1} with valid files', valid_idx)

        # READ TRAIN DATA FROM DISK
        X_train = []; y_train = []
        for k in train_idx:
            X_train.append( np.load(f'{PATH_TO_DATA}data_{k}.npy'))
            y_train.append( pd.read_parquet(f'{PATH_TO_DATA}targets_{k}.pqt') )
        X_train = np.concatenate(X_train,axis=0)
        y_train = pd.concat(y_train).target.values
        print('### Training data shapes', X_train.shape, y_train.shape)

        # READ VALID DATA FROM DISK
        X_valid = []; y_valid = []
        for k in valid_idx:
            X_valid.append( np.load(f'{PATH_TO_DATA}data_{k}.npy'))
            y_valid.append( pd.read_parquet(f'{PATH_TO_DATA}targets_{k}.pqt') )
        X_valid = np.concatenate(X_valid,axis=0)
        y_valid = pd.concat(y_valid).target.values
        print('### Validation data shapes', X_valid.shape, y_valid.shape)
        print('#'*25)

        # BUILD AND TRAIN MODEL
        K.clear_session()
        model = build_model()
        h = model.fit(X_train,y_train, 
                      validation_data = (X_valid,y_valid),
                      batch_size=512, epochs=8, verbose=VERBOSE,
                      callbacks = [LR])
        if not os.path.exists(PATH_TO_MODEL): os.makedirs(PATH_TO_MODEL)
        model.save_weights(f'{PATH_TO_MODEL}gru_fold_{fold+1}.h5')

        # INFER VALID DATA
        print('Inferring validation data...')
        p = model.predict(X_valid, batch_size=512, verbose=VERBOSE).flatten()

        print()
        print(f'Fold {fold+1} CV=', amex_metric_mod(y_valid, p) )
        print()
        true = np.concatenate([true, y_valid])
        oof = np.concatenate([oof, p])
        
        # CLEAN MEMORY
        del model, X_train, y_train, X_valid, y_valid, p
        gc.collect()

    # PRINT OVERALL RESULTS
    print('#'*25)
    print(f'Overall CV =', amex_metric_mod(true, oof) )
    K.clear_session()

This code provides a robust and optimized framework for training a model for Amex credit default prediction using 5-fold cross-validation. It incorporates essential elements like memory management, clear output, and flexibility for further customization.

Amex Credit Default Prediction: 5-Fold Cross-Validation Model Training