我想评估一下对体育比赛的估计--在我的例子中,足球(即足球)比赛。我想用Python来完成这个任务。
基本上,总是有一个team_home结果、一个team_away结果、一个estimate_home和estimate_away。例如,一个游戏结束1:0,估计是0:0 -这将返回wrong。
只有四个可能的案例和结果:
wrongtendency对胜利者的估计是正确的,但不是目标差(例如3:0)。goal difference表示正确的目标差异,例如2:1right在Python中处理估计和结果的最优雅的方法是什么?
发布于 2013-12-29 22:50:30
首先,我想请你考虑一下,你会有什么样的问题?即
我假设你至少要做前两件事!
我试图使代码可读性/简单性,但在许多方面,它比其他答案要复杂得多,但它也为您提供了一个完整的工具箱,您可以用它来处理大量的数据。因此,把它看作是另一种选择:)
基本上,当你想要的时候,你也可以在未来做更多的统计工作。但实际上,这类问题确实会影响你问题的答案(或者说:这里的答案中最适合的问题)。
我假设您有一个数据库(关系/ mongodb /任何东西),我在这里添加列表来伪造它。即使我在这里使用熊猫,这里描述的大部分事情你也可以用一种非常简单的方式在关系数据库中完成。但是熊猫会变石头;)所以这也会很好。如果您与朋友一起使用excel或csv文件进行操作,也可以直接使用read_csv或read_xls导入这些文件。
import pandas as pd
# game is a unique id (like a combination of date, home_team and away_team)
bet_list = [
{'playerid': 1, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
{'playerid': 2, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
{'playerid': 3, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
{'playerid': 4, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
{'playerid': 1, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
{'playerid': 2, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
{'playerid': 3, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
{'playerid': 4, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
{'playerid': 1, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
{'playerid': 2, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
{'playerid': 3, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
{'playerid': 4, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0}
]
result_list = [
{'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 4},
{'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 2},
{'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
]
def calculate_result(input_df):
input_df['result'] = 0
# home wins (result 1)
mask = input_df['home_goals'] > input_df['away_goals']
input_df['result'][mask] = 1
# away wins (result 2)
mask = input_df['home_goals'] < input_df['away_goals']
input_df['result'][mask] = 2
# draws (result 3)
mask = input_df['home_goals'] == input_df['away_goals']
input_df['result'][mask] = 3
# goal difference
input_df['goal_difference'] = input_df['home_goals'] - input_df['away_goals']
return input_df
# so what where the expectations?
bet_df = pd.DataFrame(bet_list)
bet_df = calculate_result(bet_df)
# if you want to look at the results
bet_df
# what were the actuals
result_df = pd.DataFrame(result_list)
result_df = calculate_result(result_df)
# if you want to look at the results
result_df
# now let's compare them!
# i take a subsetof the result df and link results on the game
combi_df = pd.merge(left=bet_df, right=result_df[['game', 'home_goals', 'away_goals', 'result', 'goal_difference']], left_on='game', right_on='game', how='inner', suffixes=['_bet', '_actual'])
# look at the data
combi_df
def calculate_bet_score(input_df):
'''
Notice that I'm keeping in extra columns, because those are nice for comparative analytics in the future. Think: "you had this right, just like x% of all the people"
'''
input_df['bet_score'] = 0
# now look at where people have correctly predicted the result
input_df['result_estimation'] = 0
mask = input_df['result_bet'] == input_df['result_actual']
input_df['result_estimation'][mask] = 1 # correct result
input_df['bet_score'][mask] = 1 # bet score for a correct result
# now look at where people have correctly predicted the difference in goals when they already predicted the result correctly
input_df['goal_difference_estimation'] = 0
bet_mask = input_df['bet_score'] == 1
score_mask = input_df['goal_difference_bet'] == input_df['goal_difference_actual']
input_df['goal_difference_estimation'][(bet_mask) & (score_mask)] = 1 # correct result
input_df['bet_score'][(bet_mask) & (score_mask)] = 2 # bet score for a correct result
# now look at where people have correctly predicted the exact goals
input_df['goal_exact_estimation'] = 0
bet_mask = input_df['bet_score'] == 2
home_mask = input_df['home_goals_bet'] == input_df['home_goals_actual']
away_mask = input_df['away_goals_bet'] == input_df['away_goals_actual']
input_df['goal_exact_estimation'][(bet_mask) & (home_mask) & (away_mask)] = 1 # correct result
input_df['bet_score'][(bet_mask) & (home_mask) & (away_mask)] = 3 # bet score for a correct result
return input_df
combi_df = calculate_bet_score(combi_df)
# now look at the results
combi_df
# and you can do nifty stuff like making a top player list like this:
combi_df.groupby('playerid')['bet_score'].sum().order(ascending=False)
# player 4 is way ahead!
# which game was the best estimated game?
combi_df.groupby('game')['bet_score'].mean().order(ascending=False)
# game 3! though abysmal predictions in general ;) 正如我所说的,它主要是为了给出一个关于Python中数据操作可能性的不同观点/想法。一旦你开始认真处理大量的数据,这种(基于向量/数字/熊猫)的方法将是最快的,但是你必须问问自己,你想在数据库内和数据库之外做什么逻辑,等等。
希望这能帮上忙!
发布于 2013-12-30 09:37:28
另一个答案,这反映了我对优雅的看法(我同意这是一个相当主观的参数)。我希望用类来定义对象,类的构建考虑到了OOP,并且使用了一个管理对象之间关系的奥姆。这带来了许多优点和更清晰的代码。
我在这里使用小马ORM,但是还有许多其他很好的选项(最终具有更宽松的许可),比如SQLAlchemy或Django's ORM。
这里是一个完整的示例-首先我们定义模型:
from pony.orm import *
class Player(db.Entity):
"""A player is somebody who place a bet, identified by its name."""
name = Required(unicode)
score = Required(int, default=0)
bets = Set('Bet', reverse='player')
# any other player's info can be stored here
class Match(db.Entity):
"""A Match is a game, played or not yet played."""
ended = Required(bool, default=False)
home_score = Required(int, default=0)
visitors_score = Required(int, default=0)
bets = Set('Bet', reverse='match')
class Bet(db.Entity):
"""A class that stores a bet for a specific game"""
match = Required(Match, reverse="bets")
home_score = Required(int, default=0)
visitors_score = Required(int, default=0)
player = Required(Player, reverse="bets")
@db_session
def calculate_wins(match):
bets = select(b for b in Bet if b.match == match)[:]
for bet in bets:
if (match.home_score == bet.home_score) and (match.visitors_score == bet.visitors_score):
bet.player.score += 3 # exact
elif (match.home_score - match.visitors_score) == (bet.home_score - bet.visitors_score):
bet.player.score += 2 # goal differences
elif ((match.home_score > match.visitors_score) == (bet.home_score > bet.visitors_score)) and \
(match.home_score != match.visitors_score) and (bet.home_score != bet.visitors_score):
bet.player.score += 1 # tendency
else:
bet.player.score += 0 # wrong使用这些类,您可以创建和更新您的比赛,球员,下注数据库。如果需要统计数据和数据聚合/排序,可以根据需要查询数据库。
db = Database('sqlite', ':memory:') # you may store it on a file if you like
db.generate_mapping(create_tables=True)
player1 = Player(name='furins')
player2 = Player(name='Martin')
match1 = Match()
furins_bet = Bet(match=match1, player=player1, home_score=0, visitors_score=0)
martin_bet = Bet(match=match1, player=player2, home_score=3, visitors_score=0)
# the game begins ...
match1.home_score = 1
match1.visitors_score = 0
# the game ended ...
match1.ended = True
commit() #let's update the database
calculate_wins(match1)
print("furins score: %d"%(player1.score)) # returns 0
print("Martin score: %d"%(player2.score)) # returns 1您甚至可以像Carst建议的那样,使用numpy集成非常复杂的时间序列数据分析,但我认为这些添加的-albeit非常有趣--与您最初的问题不太一样。
发布于 2013-12-29 19:11:46
以下是一个完整但不太优雅的解决方案:
def evaluation(team_home, team_away, estimate_home, estimate_away):
delta_result = team_home - team_away
delta_estimate = estimate_home - estimate_away
if delta_result == delta_estimate:
if team_home != estimate_home:
print "goal difference"
else:
print "right"
elif delta_result > 0 and delta_estimate > 0:
print "tendency"
elif delta_result < 0 and delta_estimate < 0:
print "tendency"
else:
print "wrong"
evaluation(2, 1, 2, 1) # right
evaluation(2, 1, 1, 0) # goal difference
evaluation(2, 1, 3, 0) # tendency
evaluation(2, 1, 0, 0) # wrong
evaluation(2, 2, 2, 2) # right
evaluation(2, 2, 1, 1) # goal difference
evaluation(2, 2, 0, 0) # goal difference
evaluation(2, 2, 1, 0) # wrong
evaluation(0, 1, 0, 1) # right
evaluation(0, 1, 1, 2) # goal difference
evaluation(0, 1, 0, 2) # tendency
evaluation(0, 1, 0, 0) # wronghttps://stackoverflow.com/questions/20828856
复制相似问题