在python中从头实现多线性回归时，训练模型不起作用

我成功地仅对Iris数据集使用numpy成功地实现了多线性回归。我想为 boston houses data set，但我的模型无法学习，我也不知道为什么。

import pandas as pd

# read data and split into test and training sets
data = pd.read_csv('train.csv')
data = (data - data.mean()) / data.std() # normalize data
split_data = np.random.rand(len(data)) < 0.8
train_data = data[split_data].round(5)
test_data = data[~split_data]

# create matrices 
input_features_train = train_data.drop(['ID','medv'],1).values
output_feature_train = train_data.medv.values.reshape(-1,1)
ones = np.ones([input_features_train.shape[0],1])
input_features_train = np.concatenate((ones,input_features_train),1)

weight = np.zeros([1,14])


def computeCost(X,y,theta):
    summed = np.power(((X @ theta.T) - y),2)
    return np.sum(summed) / (2 * len(X))


def gradientDescent(X,theta,iters,alpha):
    costs = np.zeros(iters)
    for i in range(iters):
        theta = theta - (alpha / len(X)) * np.sum(X * (X @ theta.T - y),0)
        costs[i] = computeCost(X,theta)

    return theta,costs


learning_rate = 0.01
iterations = 100000

weights,cost = gradientDescent(input_features_train,output_feature_train,weight,iterations,learning_rate)
print("Weights: ",weights)
finalCost = computeCost(input_features_train,weights)

# test 
input_features_test = test_data.drop(['ID',1).values
output_feature_test = test_data.medv.values.reshape(-1,1)
ones = np.ones([input_features_test.shape[0],1])
input_features_test = np.concatenate((ones,input_features_test),1)


def test_data(input_features,output_feature,weights):
    predictions = np.round(np.dot(input_features,weights.T))
    for i in range(len(output_feature)):
        predicted = predictions[i]
        success = predictions[i] == output_feature[i]
        print('For features: ',input_features[i],' housing price should be ',output_feature[i])
        print("Predicted: %f" % predicted)
        print("Is success? ",success)
        print()


test_data(input_features_test,output_feature_test,weights)
predictions = np.round(np.dot(input_features_test,weights.T))
accuracy = (sum(predictions == output_feature_test) / float(len(output_feature_test)) * 100)[0]
print("accuracy of the model is ",accuracy,"%  after ","iterations")

示例输出如下

Weights:  [[ 0.01465871 -0.11583742  0.17729105  0.01249782  0.09822299 -0.31249182
   0.25208063 -0.00937766 -0.48751822  0.46772537 -0.27637035 -0.1590125
   0.12926108 -0.48910136]]

For features:  [ 1.         -0.44852959 -0.47141352  0.09095532 -0.25240023  0.13793157
  0.46506236  0.03105118 -0.62153314 -0.98758424 -0.79769195  1.18594974
  0.37563165 -0.40259248]  housing price should be  [-0.04019949]
Predicted: 0.000000
Is success?  [False]

我什至尝试了10000000次迭代，但仍然无法通过所有测试，并且精度为0％。在虹膜数据集上，我设法用此模型获得了100％的收益，所以我不明白为什么它不起作用。

我怀疑这可能与数据规范化有关，因为没有它，我会收到RuntimeWarning: overflow encountered in power summed = np.power(((X @ theta.T) - y),2)错误，但我也不知道为什么会这样。您能指出我正确的方向吗？谢谢！

在python中从头实现多线性回归时，训练模型不起作用

owenlzhao 回答：在python中从头实现多线性回归时，训练模型不起作用

大家都在问