在xgboost多类工作中，base_score有什么用？

为回答您的问题，让我们看一下使用multi:softmax目标（例如6个类）在xgboost中多类分类的真正作用。

说，您想训练一个指定num_boost_round=5的分类器。您希望xgboost为您训练多少棵树？正确答案是30棵树。原因是因为softmax期望每个训练行具有num_classes=6个不同的分数，因此xgboost可以计算梯度/ hessianw.r.t。这6个分数中的每个分数，并使用它们为每个分数构建一棵新树（有效地更新6个并行模型，以便每个样本输出6个更新分数）。

为了让xgboost分类器输出每个样本的最终6个值，例如从测试集中，您需要调用bst.predict(xg_test,output_margin=True)（其中bst是您的分类器，xg_test是例如测试集）。常规bst.predict(xg_test)的输出实际上与在bst.predict(xg_test,output_margin=True)中选择最高值为6的类相同。

如果有兴趣（bst.trees_to_dataframe()是训练有素的分类器），可以使用bst函数查看所有树。

现在有一个问题，base_score在multi:softmax情况下会做什么。答案是-在添加任何树之前，将其添加为6个班级分数中每一个分数的起始分数。所以如果你应用base_score=42.，您将可以观察到bst.predict(xg_test,output_margin=True)中的所有值也会增加42。在softmax的同时，所有类别的分数增加相同的数量不会改变任何事情，因此，在multi:softmax应用base_score的情况下，与0不同的是任何可见的效果。

将此行为与二进制分类进行比较。尽管与具有2个类的multi:softmax几乎相同，但是最大的区别是xgboost尝试仅为类1产生1分，而使类0的得分等于0.0。因此，当您在二进制分类中使用base_score时，它只会添加到1类的分数中，从而增加了1类的起始预测概率。理论上，对于多个类，这对通过多个基本分数（每堂课一个），这是您无法使用base_score完成的。取而代之的是，您可以使用应用于训练集的set_base_margin功能，但是对于默认的predict来说，它并不是很方便，因此之后，您需要始终将其与output_margin=True一起使用并添加与您在set_base_margin中用于训练数据的值相同的值（如果要在多类情况下使用set_base_margin，则需要按照建议的here展平边距值）。

所有操作方式的示例：

import numpy as np
import xgboost as xgb
TRAIN = 1000
TEST = 2
F = 10

def gen_data(M):
    np_train_features = np.random.rand(M,F)
    np_train_labels = np.random.binomial(2,np_train_features[:,0])
    return xgb.DMatrix(np_train_features,label=np_train_labels)

def regenerate_data():
    np.random.seed(1)
    return gen_data(TRAIN),gen_data(TEST)

param = {}
param['objective'] = 'multi:softmax'
param['eta'] = 0.001
param['max_depth'] = 1
param['nthread'] = 4
param['num_class'] = 3


def sbm(xg_data,original_scores):
    xg_data.set_base_margin(np.array(original_scores * xg_data.num_row()).reshape(-1,1))

num_round = 3

print("#1. No base_score,no set_base_margin")
xg_train,xg_test = regenerate_data()
bst = xgb.train(param,xg_train,num_round)
print(bst.predict(xg_test,output_margin=True))
print(bst.predict(xg_test))
print("Easy to see that in this case all scores/margins have 0.5 added to them initially,which is default value for base_score here for some bizzare reason,but it doesn't really affect anything,so no one cares.")
print()
bst1 = bst

print("#2. Use base_score")
xg_train,xg_test = regenerate_data()
param['base_score'] = 5.8
bst = xgb.train(param,output_margin=True))
print(bst.predict(xg_test))
print("In this case all scores/margins have 5.8 added to them initially. And it doesn't really change anything compared to previous case.")
print()
bst2 = bst

print("#3. Use very large base_score and screw up numeric precision")
xg_train,xg_test = regenerate_data()
param['base_score'] = 5.8e10
bst = xgb.train(param,output_margin=True))
print(bst.predict(xg_test))
print("In this case all scores/margins have too big number added to them and xgboost thinks all probabilities are equal so picks class 0 as prediction.")
print("But the training actually was fine - only predict is being affect here. If you set normal base margins for test set you can see (also can look at bst.trees_to_dataframe()).")
xg_train,xg_test = regenerate_data() # if we don't regenerate the dataframe here xgboost seems to be either caching it or somehow else remembering that it didn't have base_margins and result will be different.
sbm(xg_test,[0.1,0.1,0.1])
print(bst.predict(xg_test,output_margin=True))
print(bst.predict(xg_test))
print()
bst3 = bst

print("#4. Use set_base_margin for training")
xg_train,xg_test = regenerate_data()
# only used in train/test whenever set_base_margin is not applied.
# Peculiar that trained model will remember this value even if it was trained with
# dataset which had set_base_margin. In that case this base_score will be used if
# and only if test set passed to `bst.predict` didn't have `set_base_margin` applied to it.
param['base_score'] = 4.2
sbm(xg_train,[-0.4,0.,0.8])
bst = xgb.train(param,num_round)
sbm(xg_test,0.8])
print(bst.predict(xg_test,output_margin=True))
print(bst.predict(xg_test))
print("Working - the base margin values added to the classes skewing predictions due to low eta and small number of boosting rounds.")
print("If we don't set base margins for `predict` input it will use base_score to start all scores with. Bizzare,right? But then again,not much difference on what to add here if we are adding same value to all classes' scores.")
xg_train,xg_test = regenerate_data() # regenerate test and don't set the base margin values
print(bst.predict(xg_test,output_margin=True))
print(bst.predict(xg_test))
print()
bst4 = bst

print("Trees bst1,bst2,bst3 are almost identical,because there is no difference in how they were trained. bst4 is different though.")
print(bst1.trees_to_dataframe().iloc[1,])
print()
print(bst2.trees_to_dataframe().iloc[1,])
print()
print(bst3.trees_to_dataframe().iloc[1,])
print()
print(bst4.trees_to_dataframe().iloc[1,])

其输出如下：

#1. No base_score,no set_base_margin
[[0.50240415 0.5003637  0.49870378]
 [0.49863306 0.5003637  0.49870378]]
[0. 1.]
Easy to see that in this case all scores/margins have 0.5 added to them initially,so no one cares.

#2. Use base_score
[[5.8024044 5.800364  5.798704 ]
 [5.798633  5.800364  5.798704 ]]
[0. 1.]
In this case all scores/margins have 5.8 added to them initially. And it doesn't really change anything compared to previous case.

#3. Use very large base_score and screw up numeric precision
[[5.8e+10 5.8e+10 5.8e+10]
 [5.8e+10 5.8e+10 5.8e+10]]
[0. 0.]
In this case all scores/margins have too big number added to them and xgboost thinks all probabilities are equal so picks class 0 as prediction.
But the training actually was fine - only predict is being affect here. If you set normal base margins for test set you can see (also can look at bst.trees_to_dataframe()).
[[0.10240632 0.10036398 0.09870315]
 [0.09863247 0.10036398 0.09870315]]
[0. 1.]

#4. Use set_base_margin for training
[[-0.39458954  0.00102317  0.7973728 ]
 [-0.40044016  0.00102317  0.7973728 ]]
[2. 2.]
Working - the base margin values added to the classes skewing predictions due to low eta and small number of boosting rounds.
If we don't set base margins for `predict` input it will use base_score to start all scores with. Bizzare,not much difference on what to add here if we are adding same value to all classes' scores.
[[4.2054105 4.201023  4.1973724]
 [4.1995597 4.201023  4.1973724]]
[0. 1.]

Trees bst1,because there is no difference in how they were trained. bst4 is different though.
Tree                 0
Node                 1
ID                 0-1
Feature           Leaf
Split              NaN
Yes                NaN
No                 NaN
Missing            NaN
Gain       0.000802105
Cover          157.333
Name: 1,dtype: object

Tree                 0
Node                 1
ID                 0-1
Feature           Leaf
Split              NaN
Yes                NaN
No                 NaN
Missing            NaN
Gain       0.000802105
Cover          157.333
Name: 1,dtype: object

Tree                0
Node                1
ID                0-1
Feature          Leaf
Split             NaN
Yes               NaN
No                NaN
Missing           NaN
Gain       0.00180733
Cover         100.858
Name: 1,dtype: object

在xgboost多类工作中，base_score有什么用？

iCMS 回答：在xgboost多类工作中，base_score有什么用？

大家都在问