我正在尝试对数据集中的一个热编码类别列进行编码。我正在使用以下功能:
def create_ohe(df,col):
le = LabelEncoder()
a = le.fit_transform(df_new[col]).reshape(-1,1)
ohe = OneHotEncoder(sparse=False)
column_names = [col + "_" + str(i) for i in le.classes_]
return (pd.DataFrame(ohe.fit_transform(a),columns=column_names))
在此循环中调用函数时,我收到MemoryError:
for column in categorical_columns:
temp_df = create_ohe(df_new,column)
temp = pd.concat([temp,temp_df],axis=1)
错误回溯:
MemoryError Traceback (most recent call last)
<ipython-input-40-9b241e8bf9e6> in <module>
1 for column in categorical_columns:
----> 2 temp_df = create_ohe(df_new,column)
3 temp = pd.concat([temp,axis=1)
4 print("\nShape of final df after one hot encoding: ",temp.shape)
<ipython-input-34-1530423fdf06> in create_ohe(df,col)
8 ohe = OneHotEncoder(sparse=False)
9 column_names = [col + "_" + str(i) for i in le.classes_]
---> 10 return (pd.DataFrame(ohe.fit_transform(a),columns=column_names))
MemoryError: