在Adaboost中作为弱学习者的Sklearn决策树-奇怪的行为

我正在尝试使用sklearn决策树作为弱学习者来实现Adaboost算法-在每一步,我想选择一个具有一个阈值的特征对所有样本进行分类。

我有1400个长特征向量,并希望将其标记为1或-1。功能是电影分级中的单词,标签表示“差”或“好”。在某些迭代中,决策树将特征确定为阈值0.5,并将所有样本归类为-1(无论其值如何),并且在下一次迭代中选择相同的特征,这一次将样本归类为

有人可以找到原因吗?

树打印:


feat: 311
==================
|--- worst <= 0.50
   ---> class: 1.0

| --- worst >  0.50
   ---> class: -1.0

==================
alphas = 0.16872595425475514
feat: 27
==================
|--- bad <= 0.50
   ---> class: 1.0

|--- bad >  0.50
   --->class: -1.0

==================
alphas = 0.21421414954211687
feat: 371
==================
|--- boring <= 0.50
   --->class: -1.0

|--- boring >  0.50
   ---> class: -1.0

==================
alphas = 0.1881155411693614
feat: 371
==================
|--- boring <= 0.50
   ---> class: 1.0

|--- boring >  0.50
   ---> class: -1.0

==================
alphas = 0.12644785644997397
feat: 822
==================
|--- ridiculous <= 0.50
   ---> class: -1.0

|--- ridiculous >  0.50
   ---> class: -1.0

代码:

def run_adaboost(X_train,y_train,T):
    hypotheses = []
    alpha_vals = []
    num_of_samples = len(X_train)
    D = [1/num_of_samples for _ in range(num_of_samples)]
    for t in range(T):
        h = weak_learner(D,X_train,y_train)
        idx,threshold = h.tree_.feature[0],h.tree_.threshold[0]
        tup = (get_prediction(h,X_train[0]),idx,threshold)
        print_tree(h,[vocabulary[idx] for idx in range(len(X_train[0]))])
        hypotheses.append(tup)
        epsilon = 1-h.score(X_train,sample_weight=D)
        alpha = 0.5*np.log((1-epsilon)/epsilon)
        alpha_vals.append(alpha)
        D = new_distribution(D,alpha,h)

    return hypotheses,alpha_vals


##############################################

def weak_learner(D,y_train):
    clf = tree.DecisionTreeclassifier(max_depth=1,criterion="entropy")
    clf.fit(X_train,sample_weight=D)
    return clf


def new_distribution(D,h):
    Z = 0
    Dt = [0]*len(D)
    print(f"alphas = {alpha}")
    pred = h.predict(X_train)
    for i in range(len(X_train)):
        exponent = (-1) * alpha * y_train[i] * (pred[i])
        Z += D[i]*np.exp(exponent)
    for i in range(len(X_train)):
        exponent = (-1) * alpha * y_train[i] * (pred[i])
        Dt[i] = (D[i]*np.exp(exponent))/Z
    return Dt


def get_prediction(clf,vector):
    feat = clf.tree_.feature[0]
    print(f"feat: {feat}")
    vec = vector.copy()
    vec[feat] = 0
    vec = vec.reshape(1,-1)
    return int(clf.predict(vec)[0])


def print_tree(clf,feat_name):
    r = tree.export_text(clf,feat_name)
    print(r)
    print("==================")

##############################################


def main():
    data = parse_data()
    if not data:
        return
    (X_train,X_test,y_test,vocab) = data
    global vocabulary,X_test_g,y_test_g
    X_test_g,y_test_g = X_test,y_test
    vocabulary = vocab
    T = 80
    run_adaboost(X_train,T)

if __name__ == '__main__':
    main()
hlw13781927235 回答:在Adaboost中作为弱学习者的Sklearn决策树-奇怪的行为

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/2813756.html

大家都在问