如何避免过度拟合以及如何提高精度

我正在做一个二进制分类问题,我建立了一个模型,并且我得到了像这样的准确性0。0.85391079298,但是当我提交我的提交数据时,我得到了像这样的73.55651分数,评估矩阵为:100 *(准确性(实际得分,预测得分))。我认为我的模型过拟合,请帮助我如何解决这个问题。 这是我的代码:

errlgb=[]

flod= StratifiedKFold(n_splits=5,shuffle=True,random_state=123)
i=1

for train_index,test_index in fold.split(X,y):

    X_train,X_test = X.iloc[train_index],X.iloc[test_index]
    y_train,y_test = y[train_index],y[test_index]


    clf = LGBMClassifier(
        n_estimators=3000,learning_rate=0.18,num_leaves=200,colsample_bytree=.8,subsample=.9,reg_alpha=.1,reg_lambda=.1,min_split_gain=.01,min_child_weight=2
    )
    clf.fit(X_train,y_train,eval_set=[(X_train,y_train),(X_test,y_test)],early_stopping_rounds=100,verbose=200)
    preds=m.predict_proba(X_test)[:,-1]
    print("err_xgb: ",roc_auc_score(y_test,preds))
    errlgb.append(roc_auc_score(y_test,preds))
    p = clf.predict(test_df)

这是我的输出:

Training until validation scores don't improve for 100 rounds.
Early stopping,best iteration is:
[31]    training's binary_logloss: 0.336613 valid_1's binary_logloss: 0.357114
err_lgb:  0.8699739045694661
Training until validation scores don't improve for 100 rounds.
Early stopping,best iteration is:
[26]    training's binary_logloss: 0.337955 valid_1's binary_logloss: 0.358127
err_lgb:  0.8765486027393695
Training until validation scores don't improve for 100 rounds.
Early stopping,best iteration is:
[26]    training's binary_logloss: 0.338397 valid_1's binary_logloss: 0.360132
err_lgb:  0.8732626771894375
Training until validation scores don't improve for 100 rounds.
Early stopping,best iteration is:
[29]    training's binary_logloss: 0.336385 valid_1's binary_logloss: 0.35572
err_lgb:  0.8735276671812789
Training until validation scores don't improve for 100 rounds.
Early stopping,best iteration is:
[30]    training's binary_logloss: 0.33504  valid_1's binary_logloss: 0.364632
err_lgb:  0.8684728203461445
Training until validation scores don't improve for 100 rounds.
Early stopping,best iteration is:
[30]    training's binary_logloss: 0.33692  valid_1's binary_logloss: 0.340379
err_lgb:  0.8879300398147396
Training until validation scores don't improve for 100 rounds.
Early stopping,best iteration is:
[29]    training's binary_logloss: 0.336804 valid_1's binary_logloss: 0.359437
err_lgb:  0.8743308686113594
Training until validation scores don't improve for 100 rounds.
Early stopping,best iteration is:
[29]    training's binary_logloss: 0.337057 valid_1's binary_logloss: 0.357902
err_lgb:  0.8660940927927196
Training until validation scores don't improve for 100 rounds.
Early stopping,best iteration is:
[26]    training's binary_logloss: 0.338251 valid_1's binary_logloss: 0.375692
err_lgb:  0.8508396983840842
Training until validation scores don't improve for 100 rounds.
Early stopping,best iteration is:
[21]    training's binary_logloss: 0.338989 valid_1's binary_logloss: 0.379214
err_lgb:  0.8539107929829131
crazy2079 回答:如何避免过度拟合以及如何提高精度

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/2854631.html

大家都在问