Build and Optimize XGBoost Models in R: A Comprehensive Guide
To build and optimize an XGBoost model in R, follow these steps:
- Install and load the necessary packages:
install.packages('xgboost')
library(xgboost)
- Load the data you want to use for modeling:
data <- read.csv('your_data_file.csv')
- Split the data into training and testing sets:
set.seed(123)
trainIndex <- sample(1:nrow(data), floor(0.7 * nrow(data)), replace = FALSE)
trainData <- data[trainIndex, ]
testData <- data[-trainIndex, ]
- Define the target variable and the predictors:
target <- 'target_variable_name'
predictors <- setdiff(colnames(data), target)
- Prepare the data for XGBoost modeling:
dtrain <- xgb.DMatrix(data = as.matrix(trainData[,predictors]),
label = trainData[,target])
dtest <- xgb.DMatrix(data = as.matrix(testData[,predictors]),
label = testData[,target])
- Define the XGBoost parameters:
params <- list(
objective = 'binary:logistic',
eval_metric = 'auc',
eta = 0.1,
max_depth = 6,
subsample = 0.7,
colsample_bytree = 0.7
)
- Train the XGBoost model:
model <- xgb.train(params = params,
data = dtrain,
nrounds = 100,
watchlist = list(train = dtrain, test = dtest),
early_stopping_rounds = 10,
verbose = 1)
- Use the trained XGBoost model to make predictions:
predictions <- predict(model, dtest)
- Evaluate the model performance:
library(pROC)
auc <- roc(testData[,target], predictions)$auc
- Optimize the XGBoost model parameters using cross-validation:
cv_params <- list(
eta = c(0.01, 0.1, 0.3),
max_depth = c(3, 6, 9),
subsample = c(0.5, 0.7, 1),
colsample_bytree = c(0.5, 0.7, 1)
)
tune_params <- list(
objective = 'binary:logistic',
eval_metric = 'auc',
verbose = 0
)
xgb_grid <- expand.grid(.eta = cv_params$eta,
.max_depth = cv_params$max_depth,
.subsample = cv_params$subsample,
.colsample_bytree = cv_params$colsample_bytree)
xgb_cv <- xgb.cv(params = tune_params,
data = dtrain,
nrounds = 100,
watchlist = list(train = dtrain, test = dtest),
early_stopping_rounds = 10,
verbose = 1,
nfold = 5,
folds = NULL,
stratified = TRUE,
print_every_n = 10,
maximize = TRUE,
grid = xgb_grid)
best_params <- xgb_cv$best_params
best_params
- Re-train the XGBoost model using the optimized parameters:
model <- xgb.train(params = best_params,
data = dtrain,
nrounds = 100,
watchlist = list(train = dtrain, test = dtest),
early_stopping_rounds = 10,
verbose = 1)
- Use the optimized XGBoost model to make predictions and evaluate its performance:
predictions <- predict(model, dtest)
auc <- roc(testData[,target], predictions)$auc
原文地址: https://www.cveoy.top/t/topic/nyE1 著作权归作者所有。请勿转载和采集!