我正在使用this链接imports-85.data
中提供的数据集进行价格预测。
使用horsepower
,curb-weight
,engine-size
和highway-mpg
,我尝试归一化(由于成本高)并通过执行以下操作来运行梯度下降算法:
初始化
data = df[attrs]
m = len(data) # m-training examples
f = len(attrs) # n-features
X = np.hstack((np.ones(shape=(m,1)),np.array(data)))
T = np.zeros(f + 1) # Coefficients of x(0),x(1),...x(n)
norm_price = df.price / 1000
Y = np.array(norm_price)
# Normalization
data['curb-weight'] = (data['curb-weight'] * 0.453592) / 1000 # To kg (e-1000)
data['highway-mpg'] = data['highway-mpg'] * 0.425144 # To km per litre (kml)
data['engine-size'] = data['engine-size'] / 100 # To e-100
data['horsepower'] = data['horsepower'] / 100 # To e-100
col_rename = {
'curb-weight':'curb-weight-kg(e-1000)','highway-mpg':'highway-kml','engine-size':'engine-size(e-100)','horsepower':'horsepower(e-100)'
}
data.rename(columns=col_rename,inplace=True)
成本计算
def calculateCost():
global m,T,X
hypot = (X.dot(T) - Y).transpose().dot(X.dot(T) - Y)
return hypot / (2 * m)
梯度下降
def gradDescent(threshold,iter = 10000,alpha = 3e-8):
global T,X,Y,m
i = 0
cost = calculateCost()
cost_hist = [cost]
while i < iter:
T = T - (alpha / m) * X.transpose().dot(X.dot(T) - Y)
cost = calculateCost()
cost_hist.append(cost)
i += 1
if cost <= threshold:
return cost_hist
我通过此实现进行了梯度下降: Batch Gradient Descent
如果不进行标准化,则成本为118634960.460199
。
使用规范化,成本将为118.634960460199
结果,我有几个问题:
- 我的归一化技术正确吗?
- 标准化之后,成本会有所不同。归一化后如何设置成本阈值?