在R - 前端之家

我正在尝试手动编写R代码以创建集成的机器学习模型（用于监督的二进制响应分类）。我了解R中已有一些软件包，您可以创建整体模型（例如caretEnsemble）-但是，我使用的计算机只能访问非常有限的R软件包，因此需要手工编写此过程。

我想创建一个集成模型，其中第一层使用“随机森林”算法和“自适应”算法。然后将第一层的结果传递到第二层，在第二层中，将“ xgboost”算法用于最终分类。

我已经在下面使用“声纳”数据集附加了我的代码（可复制）。

我的问题：

下面的代码正确吗？也就是说，代码是否准确反映了制作整体模型所涉及的步骤？
是否应该使用整个训练数据集再次训练第一层（在构建第二层之后）？
是否可以将两层（第一层和第二层）组合在一起，以便仅对预测的数据进行一次馈送？

`库（mlbench）

库（xgboost）

图书馆（randomForest）

图书馆（加拿大）

  library(caret)

data (Sonar)

 index1 = createDataPartition(y=Sonar$Class,p=0.75,list=FALSE)
 train_set = Sonar[index1,]
 stackset = Sonar[-index1,]  ## for testing final ensemble model

 index2 = createDataPartition(y=train_set$Class,p=0.67,list=FALSE)
 trainset = train_set[index2,]  ## for training the first layer models,randomforest and ada
 testset = train_set[-index2,]  ## for traing the second layer model,xgboost
 
#Defining the training control
fitControl <- trainControl(
method = "cv",number = 10,savePredictions = 'final',# To save out of fold predictions for best parameter combinantions
classprobs = T # To save the class probabilities of the out of fold predictions)
    

#Training the random forest model
model_rf<-train(trainset[,-61],trainset$Class,method='rf',trControl=fitControl,tuneLength=10)

#Training the ada model
model_ada<-train(trainset[,method='ada',tuneLength=5)

#Predicting probabilities for the testset data

testset$pred_rf<-predict(model_rf,testset[,type='prob')$M
testset$pred_ada<-predict(model_ada,type='prob')$M

############## fit 2nd layer: xgboost model with predicted probabilities from earlier models
predictors<-c('pred_rf','pred_ada')
model_xgboost<- 
train(testset[,predictors],testset$Class,method='xgbTree',tuneLength=3)

##### Usual approach: 

   ### Predicting probabilities for the stackset data:

   stackset$pred_rf<-predict(model_rf,stackset[,type='prob')$M
   stackset$pred_ada<-predict(model_ada,type='prob')$M

   ### finally predict completely new data with the XGBOOST model trained earlier:

   stackset$pred_xgboost<-predict(model_xgboost,predictors])
   confusionmatrix(stackset$Class,stackset$pred_xgboost)

##############  OR is this a better approach?: fit 1st layer models again on whole training dataset 

model_rf_final<-train(train_set[,train_set$Class,tuneLength=10)

model_ada_final<-train(train_set[,tuneLength=5)


   ##### now predict probabilities for the stackset data: 
  
   stackset$pred_rf<-predict(model_rf_final,type='prob')$M
   stackset$pred_ada<-predict(model_ada_final,type='prob')$M

   ### finaly predict from the XGBOOST model trained earlier:

   stackset$pred_xgboost<-predict(model_xgboost,stackset$pred_xgboost)

` 谢谢

在R

mayu_2010 回答：在R

大家都在问