您好,我在本教程中遇到了有关如何在某些库中使用python以及如何使用体育参考库来预测将来的NCAAB游戏的信息。我将发布代码以及文章。这似乎运作良好,但我认为这只是过去基于游戏的测试。我将如何使用它来预测特定团队的未来比赛?例如,在此日期,A队和B队之间的得分是多少?
我看到的问题是,只有在游戏结束后才能知道所使用的某些数据。该已知数据就是程序中用来预测得分的数据。
第一个实验:我尝试仅填充游戏发生之前我知道的数据,并使用fillna(0)填充其余数据为零。这是csv的外观:
date_team,away_assist_percentage,away_assists,away_block_percentage,away_blocks,away_defensive_rating,away_defensive_rebound_percentage,away_defensive_rebounds,away_effective_field_goal_percentage,away_field_goal_attempts,away_field_goal_percentage,away_field_goals,away_free_throw_attempt_rate,away_free_throw_attempts,away_free_throw_percentage,away_free_throws,away_losses,away_minutes_played,away_offensive_rating,away_offensive_rebound_percentage,away_offensive_rebounds,away_personal_fouls,AWAY_POINTS,away_steal_percentage ,away_steals,away_three_point_attempt_rate,away_three_point_field_goal_attempts,away_three_point_field_goal_percentage,away_three_point_field_goals,away_total_rebound_percentage,away_total_rebounds,away_true_shooting_percentage,away_turnover_percentage,away_turnovers,away_two_point_field_goal_attempts,away_two_point_field_goal_percentage,away_two_point_field_goals,away_win_percentage,away_wins,home_assist_percentage,home_assists,home_block_percentage,home_blocks,H ome_defensive_rating,home_defensive_rebound_percentage,home_defensive_rebounds,home_effective_field_goal_percentage,home_field_goal_attempts,home_field_goal_percentage,home_field_goals,home_free_throw_attempt_rate,home_free_throw_attempts,home_free_throw_percentage,home_free_throws,home_losses,home_minutes_played,home_offensive_rating,home_offensive_rebound_percentage,home_offensive_rebounds,home_personal_fouls,HOME_POINTS,home_steal_percentage,home_steals,home_three_point_attempt_rate,home_three_point_field_goal_attempts,home_three_point_field_goal_percentage,home_three_point_field_goals,home_total_rebound_percentage, home_total_rebounds,home_true_shooting_percentage,home_turnover_percentage,home_turnovers,home_two_point_field_goal_attempts,home_two_point_field_goal_percentage,home_two_point_field_goals,home_win_percentage,home_wins,pace 0,59,8,0, 0,0.7,7,42, 0,0,.1,1,0 代码的最后一行更改为: 打印(model.predict(final_trim).astype(int),y_test)
“ final_trim”是正在预测的新csv。
结果根本不准确。我想念什么?
这是原始代码:
import pandas as pd
from sportsreference.ncaab.teams import Teams
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
FIELDS_TO_DROP = ['away_points','home_points','date','location','losing_abbr','losing_name','winner','winning_abbr','winning_name','home_ranking','away_ranking']
dataset = pd.DataFrame()
teams = Teams()
for team in teams:
dataset = pd.concat([dataset,team.schedule.dataframe_extended])
X = dataset.drop(FIELDS_TO_DROP,1).dropna().drop_duplicates()
y = dataset[['home_points','away_points']].values
X_train,X_test,y_train,y_test = train_test_split(X,y)
parameters = {'bootstrap': False,'min_samples_leaf': 3,'n_estimators': 50,'min_samples_split': 10,'max_features': 'sqrt','max_depth': 6}
model = RandomForestRegressor(**parameters)
model.fit(X_train,y_train)
print(model.predict(X_test).astype(int),y_test)
这是我从中得到的帖子: https://towardsdatascience.com/predict-college-basketball-scores-in-30-lines-of-python-148f6bd71894
谢谢!