线性相关系数给出无法解释的结果

我开始学习python,针对这个问题,我尝试编写线性回归例程。为了测试我的代码,我测试了两个数据集。第一个数据集来自我跟踪研究的数学步骤的网站。使用该数据集运行代码将提供正确的解决方案。我制作的第二个数据集给出的r平方值大于1,但散点图表明r平方值应接近1但不大于1。我使用excel进行了测试。下面我提供我的python代码。我相信错误在于我的代码部分中关于r平方值的计算。任何帮助都感激不尽。我的代码包含注释,指出哪些数据集给我带来了麻烦,哪些没有给我带来麻烦。

 Created on Fri Nov 01 2019 1:15:01 PM

 Copyright (c) 2019 Deep Sen

Linear Regression Math steps:
https://machinelearningmastery.com/simple-linear-regression-tutorial-for-machine-learning/

I wrote the python code so that I can understand every step. This is version 1.
Version 2 will have functions and user can enter data.
Version 3 will allow user to point to an csv file with x,y data.
Version 4 will have grahical user interface

# Simple Linear regression

import matplotlib.pyplot as plt
import numpy as np

# Data

# Trial data in the link given above
#depth_x = np.array([1,2,4,3,5])
#age_y = np.array([1,5])

# When I use the data above,all my intermidiate solutions match all the intermidiate step solutions provided in the link above. An excel plot of the data gives the same slope,intercept and RMSE value so the mathematics shown is correct.

# But when I use the data from Waltham,see data block below,I get a really weird RMSE value in the two hundreds! What am I missing? The slope value and intercept value matches with an excel plot of the data below.

# Data in Mathematics Tool for Geologist by David Waltham,pg 21,Table 2.2

depth_x = np.array([0.5,1.3,2.47,4.9,8.2])
age_y = np.array([1020,2376,5008,10203,15986])


Simple linear regression model y = b0 + b1*x
where x is the independent variable,y is the dependent variable,b0 is the intercept and b1 is the slope.

#######################################

### Estimating b1,which represents the slope

1) b1 can be estimated by :

b1 = sum((xi-mean(x)) * (yi-mean(y))) / sum((xi – mean(x))**2)

b1 = (ss_xy/ss_xx)**2

where xi and yi are the ith value of x and y in an array or a list.


# Mean of depth_x and age_y

average_x = np.mean(depth_x)
average_y = np.mean(age_y)
print('Average Depth: ',average_x,'m' '\nAverage Age:',average_y,'yr')

# Calculating difference between depth_xi and average_x

diff_x = depth_x - average_x

print('(xi - mean x): ',diff_x)

# Calculating difference between age_yi and average_y
diff_y = age_y - average_y

print('(yi - mean y): ',diff_y)

# Products of the differences

p_xy = diff_x * diff_y

print('Product of difference: ',p_xy)

# Sum of products

sp_xy = np.sum(p_xy)

print('Sum of Products: ',sp_xy)

# Sum of the difference between xi and mean x

sp_xx = np.sum(diff_x**2)

print('Sum of difference (xi - mean x): ',sp_xx)


# Calculating b1 = (sp_xy / sp_xx)**2

b1 = (sp_xy/sp_xx)

print('Slope: ',b1)

#####################################

# Estimating b0 which represents the intercept


2) b0 can be estimated by:

b0 = mean(y) – b1 * mean(x)


b0 = average_y - b1 * average_x

print('Intercept: ',b0)

######################################

# Calculaing the Root Mean Square Error

# Calculating predicted value of y

pred_y = b0 + (b1 * depth_x)

print('Predicted y value: ',pred_y)

# RMSE = sqrt( sum( (pred_y – yi)^2 )/n )

# Square of difference between pred_y and yi

sqdiff_y = (pred_y - age_y)**2

print('Square of (pred_y - yi): ',sqdiff_y)

#Sum of the Square of difference between pred_y and yi

s_sqdiff_y = np.sum(sqdiff_y)

print('Sum of square of (pred_y - yi): ',s_sqdiff_y)

# Average of the Sum of the Square of difference between pred_y and yi

av_s_sqdiff_y = s_sqdiff_y / np.size(age_y)

print('Average of the Sum of the Square of (pred_y - yi): ',av_s_sqdiff_y)

# Square root of Average of the Sum of the Square of difference between pred_y and yi

rmse = np.sqrt(av_s_sqdiff_y)

print('Root Mean Square Error: ',rmse)

##################################

# Plotting scatter plot of data

plt.scatter(depth_x,age_y,color='m',marker = 'o',s = 30)

# Plotting Linear fit

plt. plot(depth_x,pred_y,color='r')

plt.show()
huozong 回答:线性相关系数给出无法解释的结果

暂时没有好的解决方案,如果你有好的解决方案,请发邮件至:iooj@foxmail.com
本文链接:https://www.f2er.com/3147114.html

大家都在问