如何使用RandomForestRegressor方法用Python中的scikitlearn和pandas预测将来的结果?

您好,我在本教程中遇到了有关如何在某些库中使用python以及如何使用体育参考库来预测将来的NCAAB游戏的信息。我将发布代码以及文章。这似乎运作良好,但我认为这只是过去基于游戏的测试。我将如何使用它来预测特定团队的未来比赛?例如,在此日期,A队和B队之间的得分是多少?

我看到的问题是,只有在游戏结束后才能知道所使用的某些数据。该已知数据就是程序中用来预测得分的数据。

第一个实验:我尝试仅填充游戏发生之前我知道的数据,并使用fillna(0)填充其余数据为零。这是csv的外观:

date_team,away_assist_percentage,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,away_free_throw_attempt_rate,away_free_throw_attempts,away_free_throw_percentage,away_free_throws,away_losses,away_minutes_played,away_offensive_rating,away_offensive_rebound_percentage,away_offensive_rebounds,away_personal_fouls,AWAY_POINTS,away_steal_percentage ,away_steals,away_three_point_attempt_rate,away_three_point_field_goal_attempts,away_three_point_field_goal_percentage,away_three_point_field_goals,away_total_rebound_percentage,away_total_rebounds,away_true_shooting_percentage,away_turnover_percentage,away_turnovers,away_two_point_field_goal_attempts,away_two_point_field_goal_percentage,away_two_point_field_goals,away_win_percentage,away_wins,home_assist_percentage,home_assists,home_block_percentage,home_blocks,H ome_defensive_rating,home_defensive_rebound_percentage,home_defensive_rebounds,home_effective_field_goal_percentage,home_field_goal_attempts,home_field_goal_percentage,home_field_goals,home_free_throw_attempt_rate,home_free_throw_attempts,home_free_throw_percentage,home_free_throws,home_losses,home_minutes_played,home_offensive_rating,home_offensive_rebound_percentage,home_offensive_rebounds,home_personal_fouls,HOME_POINTS,home_steal_percentage,home_steals,home_three_point_attempt_rate,home_three_point_field_goal_attempts,home_three_point_field_goal_percentage,home_three_point_field_goals,home_total_rebound_percentage, home_total_rebounds,home_true_shooting_percentage,home_turnover_percentage,home_turnovers,home_two_point_field_goal_attempts,home_two_point_field_goal_percentage,home_two_point_field_goals,home_win_percentage,home_wins,pace 0,59,8,0, 0,0.7,7,42, 0,0,.1,1,0 代码的最后一行更改为: 打印(model.predict(final_trim).astype(int),y_test)

“ final_trim”是正在预测的新csv。

结果根本不准确。我想念什么?

这是原始代码:

import pandas as pd
from sportsreference.ncaab.teams import Teams
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

FIELDS_TO_DROP = ['away_points','home_points','date','location','losing_abbr','losing_name','winner','winning_abbr','winning_name','home_ranking','away_ranking']

dataset = pd.DataFrame()
teams = Teams()
for team in teams:
    dataset = pd.concat([dataset,team.schedule.dataframe_extended])
X = dataset.drop(FIELDS_TO_DROP,1).dropna().drop_duplicates()
y = dataset[['home_points','away_points']].values
X_train,X_test,y_train,y_test = train_test_split(X,y)
parameters = {'bootstrap': False,'min_samples_leaf': 3,'n_estimators': 50,'min_samples_split': 10,'max_features': 'sqrt','max_depth': 6}
model = RandomForestRegressor(**parameters)
model.fit(X_train,y_train)
print(model.predict(X_test).astype(int),y_test)

这是我从中得到的帖子:  https://towardsdatascience.com/predict-college-basketball-scores-in-30-lines-of-python-148f6bd71894

谢谢!

huang346197 回答:如何使用RandomForestRegressor方法用Python中的scikitlearn和pandas预测将来的结果?

以这种方式进行思考,如果您想测试模型的拟合优度,那么您必须事先知道结果,以便可以测量(模型)输出与模型之间的距离。实际结果并进行必要的调整以改善模型的整体性能。

一旦您对模型进行了训练,如果您想预测未来的价值,那么(无需对工作原理有太多了解)就应该向模型提供与训练时相同的功能,但是您将获得数据您的预测。这是一个非常基本的示例,使用两个变量来预测两支球队(A和B)的得分:

import pandas as pd 
data = {'Temperature':[10,20,30,25],'Humidity':[40,50,80,65],'Score_A':[1,2,3,2],'Score_B':[6,1,2]}
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
df = pd.DataFrame(data)
print(df)
X = df[['Temperature','Humidity']]
Y = df[['Score_A','Score_B']]
X_train,X_test,y_train,y_test = train_test_split(X,Y,random_state=42)
model = RandomForestRegressor(random_state=42)
model.fit(X_train,y_train)

在这里,我已经训练了模型,因此,如果要进行将来的预测,则需要传递与训练中所使用的相同的功能(温度和湿度),但要传递我要进行预测的值上。假设我们的朋友气象学家说,下一场比赛的温度和湿度分别为35和70。因此,我需要对这些值使用.predict()

model.predict(print(model.predict([[35,70]])) 

哪个返回输出:

[[2.6 1.4]]

如果您想使其更出色:

prediction = model.predict([[35,70]])
print("Team A will score: ",prediction[0][0])
print("Team B will score: ",prediction[0][1])

返回:

Team A will score:  2.6
Team B will score:  1.4
本文链接:https://www.f2er.com/2930617.html

大家都在问