我正在研究IBM的数据平台。我能够将数据加载到pyspark数据框中,并生成了一个spark SQL表。分割数据集后,将其输入到分类算法中。它会引发错误,例如无法加载Spark SQL数据。必需的ndarrays。
from sklearn.ensemble import RandomForestRegressor`
from sklearn.model_selection import train_test_split`
from sklearn import preprocessing`
import numpy as np`
X_train,y_train,X_test,y_test = train_test_split(x,y,test_size = 0.1,random_state = 42)
RM = RandomForestRegressor()
RM.fit(X_train.reshape(1,-1),y_train)`
错误:
TypeError:预期的序列或类似数组的类型,得到了{
此错误发生后,我做了这样的事情:
x = spark.sql('select Id,YearBuilt,MoSold,YrSold,Fireplaces FROM Train').toPandas()
y = spark.sql('Select SalePrice FROM Train where SalePrice is not null').toPandas()
错误:
AttributeError跟踪(最近一次通话) 在()中 5 X_train,y_train,X_test,y_test = train_test_split(x,y,test_size = 0.1,random_state = 42) 6 RM = RandomForestRegressor() ----> 7 RM.fit(X_train.reshape(1,-1),y_train) / strong / getattr 中的/opt/ibm/conda/miniconda3.6/lib/python3.6/site-packages/pandas/core/generic.py(自身,名称) 第5065章(小幸运)心有灵犀 5066返回自身[名称] -> 5067返回对象。 getattribute ((自身,名称) 5068 5069 def setattr ((自身,名称,值): AttributeError:“ DataFrame”对象没有属性“ reshape”