LabelEncoding多个分类列

2024-05-04 • 问答

我有一个混合了int，float，categorical和bool数据类型的DataFrame，我正在尝试使用LabelEncoder.fit_transform将class分类和bool的数据类型转换为int。在单列上执行时，它工作得很好，但是当我尝试通过DF进行for循环时，出现以下错误：

relabel = preprocessing.LabelEncoder()
for i in first_buyer.columns:
    if str(first_buyer[i].dtypes) not in ["float64","int64","bool"]:
        first_buyer[i] = relabel.fit_transform(first_buyer[i])

错误：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py in _encode(values,uniques,encode)
    104         try:
--> 105             res = _encode_python(values,encode)
    106         except TypeError:

~\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py in _encode_python(values,encode)
     58     if uniques is None:
---> 59         uniques = sorted(set(values))
     60         uniques = np.array(uniques,dtype=values.dtype)

TypeError: '<' not supported between instances of 'str' and 'int'

During handling of the above exception,another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-17-42e60975f0b6> in <module>
      4 for i in first_buyer.columns:
      5     if str(first_buyer[i].dtypes) not in ["float64","bool"]:
----> 6         first_buyer[i] = relabel.fit_transform(first_buyer[i])

~\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py in fit_transform(self,y)
    234         """
    235         y = column_or_1d(y,warn=True)
--> 236         self.classes_,y = _encode(y,encode=True)
    237         return y
    238 

~\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py in _encode(values,encode)
    105             res = _encode_python(values,encode)
    106         except TypeError:
--> 107             raise TypeError("argument must be a string or number")
    108         return res
    109     else:

TypeError: argument must be a string or number

我尝试从其他人使用OneHotCoder和MultiColumnLabelEncoder类，但是它们都给我带来了相同的错误

MultiColumnLabelEncoder(columns = [col for col in first_buyer if 
str(first_buyer[i].dtypes) not 
in["float64","bool"]]).fit_transform(first_buyer)

我希望代码的输出会将所有分类变量转换为数字，以便我可以训练数据集

from sklearn.preprocessing import LabelEncoder import numpy as np import pandas as pd relabel = LabelEncoder() # Nas in the dataframe data1 = pd.DataFrame([['a','b','c'],['1','2',np.nan]],columns=['A','B','C']) # Will raise an error relabel.fit_transform(data1['C']) # Mixed types data2 = pd.DataFrame([['a',3]],'C']) # Will raise an error relabel.fit_transform(data2['C']) # Clean data data3 = pd.DataFrame([['a','3']],'C']) # Will work relabel.fit_transform(data3['C'])

# Define a dummy variable for missing values that is of the same type as the column data1['C'] = data1['C'].fillna('DUMMY_VARIABLE_FOR_NA') # Will work now relabel.fit_transform(data1['C']) data2['C'] = data2['C'].astype(str) # Will work now relabel.fit_transform(data2['C'])

LabelEncoding多个分类列

meachol 回答：LabelEncoding多个分类列

大家都在问