与Imblearn管道和GridSearchCV进行交叉验证

我正在尝试使用Pipelineimblearn中的GridSearchCV类来获取最佳参数,以对不平衡数据集进行分类。根据{{​​3}}中提到的答案,我想省略对验证集的重采样,而仅对训练集进行重采样,imblearn的{​​{1}}似乎正在这样做。但是,在实施接受的解决方案时出现错误。请让我知道我在做什么错。下面是我的实现:

Pipeline

参数:

def imb_pipeline(clf,X,y,params):

    model = Pipeline([
        ('sampling',SMOTE()),('classification',clf)
    ])

    score={'AUC':'roc_auc','RECALL':'recall','PRECISION':'precision','F1':'f1'}

    gcv = GridSearchCV(estimator=model,param_grid=params,cv=5,scoring=score,n_jobs=12,refit='F1',return_train_score=True)
    gcv.fit(X,y)

    return gcv

for param,classifier in zip(params,classifiers):
    print("Working on {}...".format(classifier[0]))
    clf = imb_pipeline(classifier[1],X_scaled,param) 
    print("Best parameter for {} is {}".format(classifier[0],clf.best_params_))
    print("Best `F1` for {} is {}".format(classifier[0],clf.best_score_))
    print('-'*50)
    print('\n')

分类器:

[{'penalty': ('l1','l2'),'C': (0.01,0.1,1.0,10)},{'n_neighbors': (10,15,25)},{'n_estimators': (80,100,150,200),'min_samples_split': (5,7,10,20)}]

错误:

[('Logistic Regression',LogisticRegression(C=1.0,class_weight=None,dual=False,fit_intercept=True,intercept_scaling=1,l1_ratio=None,max_iter=100,multi_class='warn',n_jobs=None,penalty='l2',random_state=None,solver='warn',tol=0.0001,verbose=0,warm_start=False)),('KNearestNeighbors',KNeighborsClassifier(algorithm='auto',leaf_size=30,metric='minkowski',metric_params=None,n_neighbors=5,p=2,weights='uniform')),('Gradient Boosting Classifier',GradientBoostingClassifier(criterion='friedman_mse',init=None,learning_rate=0.1,loss='deviance',max_depth=3,max_features=None,max_leaf_nodes=None,min_impurity_decrease=0.0,min_impurity_split=None,min_samples_leaf=1,min_samples_split=2,min_weight_fraction_leaf=0.0,n_estimators=100,n_iter_no_change=None,presort='auto',subsample=1.0,validation_fraction=0.1,warm_start=False))]
lingwoniu 回答:与Imblearn管道和GridSearchCV进行交叉验证

请检查此示例如何在管道中使用参数:  -https://scikit-learn.org/stable/auto_examples/compose/plot_compare_reduction.html#sphx-glr-auto-examples-compose-plot-compare-reduction-py

无论何时使用管道,您都需要以某种方式发送参数,以便管道可以了解哪个参数用于列表中的哪个步骤。为此,它使用您在管道初始化期间提供的名称。

在您的代码中,例如:

model = Pipeline([
        ('sampling',SMOTE()),('classification',clf)
    ])

要将参数p1传递给SMOTE,您可以使用sampling__p1作为参数,而不是p1

您使用"classification"作为clf的名称,因此将其附加到本应用于clf的参数上。

尝试:

[{'classification__penalty': ('l1','l2'),'classification__C': (0.01,0.1,1.0,10)},{'classification__n_neighbors': (10,15,25)},{'classification__n_estimators': (80,100,150,200),'min_samples_split': (5,7,10,20)}]

确保名称和参数之间有两个下划线。

本文链接:https://www.f2er.com/3118862.html

大家都在问