我有一个混合了int,float,categorical和bool数据类型的DataFrame,我正在尝试使用LabelEncoder.fit_transform将class分类和bool的数据类型转换为int。在单列上执行时,它工作得很好,但是当我尝试通过DF进行for循环时,出现以下错误:
relabel = preprocessing.LabelEncoder()
for i in first_buyer.columns:
if str(first_buyer[i].dtypes) not in ["float64","int64","bool"]:
first_buyer[i] = relabel.fit_transform(first_buyer[i])
错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py in _encode(values,uniques,encode)
104 try:
--> 105 res = _encode_python(values,encode)
106 except TypeError:
~\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py in _encode_python(values,encode)
58 if uniques is None:
---> 59 uniques = sorted(set(values))
60 uniques = np.array(uniques,dtype=values.dtype)
TypeError: '<' not supported between instances of 'str' and 'int'
During handling of the above exception,another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-17-42e60975f0b6> in <module>
4 for i in first_buyer.columns:
5 if str(first_buyer[i].dtypes) not in ["float64","bool"]:
----> 6 first_buyer[i] = relabel.fit_transform(first_buyer[i])
~\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py in fit_transform(self,y)
234 """
235 y = column_or_1d(y,warn=True)
--> 236 self.classes_,y = _encode(y,encode=True)
237 return y
238
~\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py in _encode(values,encode)
105 res = _encode_python(values,encode)
106 except TypeError:
--> 107 raise TypeError("argument must be a string or number")
108 return res
109 else:
TypeError: argument must be a string or number
我尝试从其他人使用OneHotCoder和MultiColumnLabelEncoder类,但是它们都给我带来了相同的错误
MultiColumnLabelEncoder(columns = [col for col in first_buyer if
str(first_buyer[i].dtypes) not
in["float64","bool"]]).fit_transform(first_buyer)
我希望代码的输出会将所有分类变量转换为数字,以便我可以训练数据集