拟合训练和测试集,train_test_split方法

我正在尝试使用train_test_split评估我的模型。我定义了以下函数,根据函数中的输入在表(顶部列)上创建输出数组:

def top_sh(num):
    ###Get the top(num) in Shanghai data and arrange
    ####input and output variables accordingly
    #Add column to be output value,either zero or one

    #shanghai = shanghai_cp.copy()
    if 'top' in shanghai.columns:
        shanghai.drop(columns = shanghai.columns[-1],inplace = True) 

    shanghai['top'] = shanghai['world_rank'].apply(lambda x: 1 if x<= num else 0)
    out = print('*****************'+ '\n' + 'Output array: Top'+ str(num)+ '\n' + 'Disregarding in Analysis: World rank')
    #call = print(shanghai.head(15))

    return out

然后,我将火车测试拆分的过程定义如下:

def train_test(df,size,seed):
    ###Split the data into test and train sets and test

    #Get input output of df
    if df == 'shanghai':
        column1 = shanghai.columns[1:7]
        Y = shanghai.values[:,-1].astype(int)
        y = np.ravel(Y)
        X = shanghai.values[:,1:7]
    elif df == 'times':
        column1 = times.columns[1:10]
        Y = times.values[:,-1].astype(int)
        y = np.ravel(Y)
        X = times.values[:,1:10]
    else:
        return print('Available Datasets: "shanghai","times"')

    #Split into train and test
    X_Train,X_Test,Y_Train,Y_Test = train_test_split(X,Y,test_size=size,random_state=seed)

    #Get the regression
    model= LogisticRegression(solver='liblinear')
    model.fit(X_Train,X_Test)

    #See how accurately it is with the split
    result=model.score(X_Test,Y_Test)

    print(f'accuaracy {result*100:5.3f}')

    return

我运行以下代码:

top_sh(50)
shanghai.head()
X.shape
Y
Y.shape
train_test('shanghai',0.3,7)
```

X.shape = (768,8)
Y.shape = (768,)

I get the following error on train_test function,specifically on model.fit line:

> ValueError: bad input shape (150,6)

shuangzai520 回答:拟合训练和测试集,train_test_split方法

问题很可能是由您传递给fit引起的。它期望将X值用作预测变量,并将Y值用作预测变量,因此,您这一行是不正确的:

model.fit(X_Train,X_Test)

您应该改为尝试传递Y_train

model.fit(X_train,Y_train)
本文链接:https://www.f2er.com/3109990.html

大家都在问