如何用训练集计算的WOE替换测试集中的分类变量？

2024-05-19 • 问答

因此，我使用Python训练集中的实例计算了两个高基数变量（Postal_Code_L和Managing_Sales_Office_Nbr）的证据权重。

问题在于，现在我要用其WOE替换测试集中实例中的这些高基数变量的值。我该怎么办？

First 20 instances of the high cardinality variables and their WOE. For instances in the testset,the WOE is missing and should be replaced with the WOE calculated using the instances of the training set

更新： I have also made a dataframe,consisting of all unique postal codes in the training set and their WOE。我尝试了以下代码，如果邮政编码在训练集中出现，则用其WOE替换整个数据集中缺失的WOE值；如果邮政编码不出现在数据集中，则用WOE的平均值替换

对于TestsetwithWOE ['Postal_Code_L']中的i：

if TestsetwithWOE['Postal_Code_L'][i] in WOEvaluespostalcode['MIN_VALUE']:
    TestsetwithWOE['Postal_Code_L_WOE'][i] == WOEvaluespostalcode['WOE']
else:
    TestsetwithWOE['Postal_Code_L_WOE'][i] == meanWOEPostalCode

运行此代码会给我以下错误： TypeError：无法使用类“ float”的这些索引器[nan]在类“ pandas.core.indexes.range.RangeIndex”上进行标签索引

如果我将两个数据框的所有列中的所有缺失值都更改为“未知”。我收到KeyError：“未知”。如何更改代码以使其起作用？

如何用训练集计算的WOE替换测试集中的分类变量？

pipippip 回答：如何用训练集计算的WOE替换测试集中的分类变量？

大家都在问