To build and optimize an XGBoost model in R, follow these steps:

  1. Install and load the necessary packages:
install.packages('xgboost')
library(xgboost)
  1. Load the data you want to use for modeling:
data <- read.csv('your_data_file.csv')
  1. Split the data into training and testing sets:
set.seed(123)
trainIndex <- sample(1:nrow(data), floor(0.7 * nrow(data)), replace = FALSE)
trainData <- data[trainIndex, ]
testData <- data[-trainIndex, ]
  1. Define the target variable and the predictors:
target <- 'target_variable_name'
predictors <- setdiff(colnames(data), target)
  1. Prepare the data for XGBoost modeling:
dtrain <- xgb.DMatrix(data = as.matrix(trainData[,predictors]), 
                      label = trainData[,target])
dtest <- xgb.DMatrix(data = as.matrix(testData[,predictors]), 
                     label = testData[,target])
  1. Define the XGBoost parameters:
params <- list(
  objective = 'binary:logistic',
  eval_metric = 'auc',
  eta = 0.1,
  max_depth = 6,
  subsample = 0.7,
  colsample_bytree = 0.7
)
  1. Train the XGBoost model:
model <- xgb.train(params = params, 
                   data = dtrain, 
                   nrounds = 100, 
                   watchlist = list(train = dtrain, test = dtest), 
                   early_stopping_rounds = 10, 
                   verbose = 1)
  1. Use the trained XGBoost model to make predictions:
predictions <- predict(model, dtest)
  1. Evaluate the model performance:
library(pROC)
auc <- roc(testData[,target], predictions)$auc
  1. Optimize the XGBoost model parameters using cross-validation:
cv_params <- list(
  eta = c(0.01, 0.1, 0.3),
  max_depth = c(3, 6, 9),
  subsample = c(0.5, 0.7, 1),
  colsample_bytree = c(0.5, 0.7, 1)
)
tune_params <- list(
  objective = 'binary:logistic',
  eval_metric = 'auc',
  verbose = 0
)
xgb_grid <- expand.grid(.eta = cv_params$eta,
                        .max_depth = cv_params$max_depth,
                        .subsample = cv_params$subsample,
                        .colsample_bytree = cv_params$colsample_bytree)
xgb_cv <- xgb.cv(params = tune_params, 
                 data = dtrain, 
                 nrounds = 100, 
                 watchlist = list(train = dtrain, test = dtest), 
                 early_stopping_rounds = 10, 
                 verbose = 1, 
                 nfold = 5, 
                 folds = NULL, 
                 stratified = TRUE, 
                 print_every_n = 10, 
                 maximize = TRUE, 
                 grid = xgb_grid)
best_params <- xgb_cv$best_params
best_params
  1. Re-train the XGBoost model using the optimized parameters:
model <- xgb.train(params = best_params, 
                   data = dtrain, 
                   nrounds = 100, 
                   watchlist = list(train = dtrain, test = dtest), 
                   early_stopping_rounds = 10, 
                   verbose = 1)
  1. Use the optimized XGBoost model to make predictions and evaluate its performance:
predictions <- predict(model, dtest)
auc <- roc(testData[,target], predictions)$auc
Build and Optimize XGBoost Models in R: A Comprehensive Guide

原文地址: https://www.cveoy.top/t/topic/nyE1 著作权归作者所有。请勿转载和采集!

免费AI点我,无需注册和登录