我正在尝试使用train_test_split评估我的模型。我定义了以下函数,根据函数中的输入在表(顶部列)上创建输出数组:
def top_sh(num):
###Get the top(num) in Shanghai data and arrange
####input and output variables accordingly
#Add column to be output value,either zero or one
#shanghai = shanghai_cp.copy()
if 'top' in shanghai.columns:
shanghai.drop(columns = shanghai.columns[-1],inplace = True)
shanghai['top'] = shanghai['world_rank'].apply(lambda x: 1 if x<= num else 0)
out = print('*****************'+ '\n' + 'Output array: Top'+ str(num)+ '\n' + 'Disregarding in Analysis: World rank')
#call = print(shanghai.head(15))
return out
然后,我将火车测试拆分的过程定义如下:
def train_test(df,size,seed):
###Split the data into test and train sets and test
#Get input output of df
if df == 'shanghai':
column1 = shanghai.columns[1:7]
Y = shanghai.values[:,-1].astype(int)
y = np.ravel(Y)
X = shanghai.values[:,1:7]
elif df == 'times':
column1 = times.columns[1:10]
Y = times.values[:,-1].astype(int)
y = np.ravel(Y)
X = times.values[:,1:10]
else:
return print('Available Datasets: "shanghai","times"')
#Split into train and test
X_Train,X_Test,Y_Train,Y_Test = train_test_split(X,Y,test_size=size,random_state=seed)
#Get the regression
model= LogisticRegression(solver='liblinear')
model.fit(X_Train,X_Test)
#See how accurately it is with the split
result=model.score(X_Test,Y_Test)
print(f'accuaracy {result*100:5.3f}')
return
我运行以下代码:
top_sh(50)
shanghai.head()
X.shape
Y
Y.shape
train_test('shanghai',0.3,7)
```
X.shape = (768,8)
Y.shape = (768,)
I get the following error on train_test function,specifically on model.fit line:
> ValueError: bad input shape (150,6)