首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用Python评估体育游戏估计值的最优雅的方法是什么?

用Python评估体育游戏估计值的最优雅的方法是什么?
EN

Stack Overflow用户
提问于 2013-12-29 18:56:53
回答 4查看 730关注 0票数 1

我想评估一下对体育比赛的估计--在我的例子中,足球(即足球)比赛。我想用Python来完成这个任务。

基本上,总是有一个team_home结果、一个team_away结果、一个estimate_homeestimate_away。例如,一个游戏结束1:0,估计是0:0 -这将返回wrong

只有四个可能的案例和结果:

  1. 如上例所示的wrong
  2. tendency对胜利者的估计是正确的,但不是目标差(例如3:0)。
  3. goal difference表示正确的目标差异,例如2:1
  4. 精确右估计的right

在Python中处理估计和结果的最优雅的方法是什么?

EN

回答 4

Stack Overflow用户

回答已采纳

发布于 2013-12-29 22:50:30

首先,我想请你考虑一下,你会有什么样的问题?即

  • 你想要向每个球员报告他的估计数和实际值的列表吗?
  • 你想给玩家打分吗?
  • 你想做更多的统计工作吗?(玩家x在评估y队参与的比赛时更好)

我假设你至少要做前两件事!

我试图使代码可读性/简单性,但在许多方面,它比其他答案要复杂得多,但它也为您提供了一个完整的工具箱,您可以用它来处理大量的数据。因此,把它看作是另一种选择:)

基本上,当你想要的时候,你也可以在未来做更多的统计工作。但实际上,这类问题确实会影响你问题的答案(或者说:这里的答案中最适合的问题)。

我假设您有一个数据库(关系/ mongodb /任何东西),我在这里添加列表来伪造它。即使我在这里使用熊猫,这里描述的大部分事情你也可以用一种非常简单的方式在关系数据库中完成。但是熊猫会变石头;)所以这也会很好。如果您与朋友一起使用excel或csv文件进行操作,也可以直接使用read_csv或read_xls导入这些文件。

代码语言:javascript
复制
import pandas as pd

# game is a unique id (like a combination of date, home_team and away_team)
bet_list = [
    {'playerid': 1, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
    {'playerid': 2, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
    {'playerid': 3, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
    {'playerid': 4, 'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
    {'playerid': 1, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
    {'playerid': 2, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
    {'playerid': 3, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
    {'playerid': 4, 'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},   
    {'playerid': 1, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 5},
    {'playerid': 2, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 1},
    {'playerid': 3, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 1, 'away_goals': 0},
    {'playerid': 4, 'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0}  
]

result_list = [
    {'game': 1, 'date': 1, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 3, 'away_goals': 4},
    {'game': 2, 'date': 2, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 2, 'away_goals': 2},
    {'game': 3, 'date': 3, 'home_team': 'Bayern', 'away_team': 'VfL', 'home_goals': 0, 'away_goals': 0},
]

def calculate_result(input_df):
    input_df['result'] = 0
    # home wins (result 1)
    mask = input_df['home_goals'] > input_df['away_goals']
    input_df['result'][mask] = 1
    # away wins (result 2)
    mask = input_df['home_goals'] < input_df['away_goals']
    input_df['result'][mask] = 2
    # draws (result 3)
    mask = input_df['home_goals'] == input_df['away_goals']
    input_df['result'][mask] = 3
    # goal difference
    input_df['goal_difference'] = input_df['home_goals'] - input_df['away_goals']
    return input_df

# so what where the expectations?
bet_df = pd.DataFrame(bet_list)
bet_df = calculate_result(bet_df)
# if you want to look at the results
bet_df

# what were the actuals
result_df = pd.DataFrame(result_list)
result_df = calculate_result(result_df)
# if you want to look at the results
result_df

# now let's compare them!
# i take a subsetof the result df and link results on the game
combi_df = pd.merge(left=bet_df, right=result_df[['game', 'home_goals', 'away_goals', 'result', 'goal_difference']], left_on='game', right_on='game', how='inner', suffixes=['_bet', '_actual'])
# look at the data
combi_df

def calculate_bet_score(input_df):
    '''
Notice that I'm keeping in extra columns, because those are nice for comparative analytics in the future. Think: "you had this right, just like x% of all the people"

    '''
    input_df['bet_score'] = 0
    # now look at where people have correctly predicted the result
    input_df['result_estimation'] = 0
    mask = input_df['result_bet'] == input_df['result_actual']
    input_df['result_estimation'][mask] = 1 # correct result
    input_df['bet_score'][mask] = 1 # bet score for a correct result
    # now look at where people have correctly predicted the difference in goals when they already predicted the result correctly
    input_df['goal_difference_estimation'] = 0
    bet_mask = input_df['bet_score'] == 1
    score_mask = input_df['goal_difference_bet'] == input_df['goal_difference_actual']
    input_df['goal_difference_estimation'][(bet_mask) & (score_mask)] = 1 # correct result
    input_df['bet_score'][(bet_mask) & (score_mask)] = 2 # bet score for a correct result
    # now look at where people have correctly predicted the exact goals
    input_df['goal_exact_estimation'] = 0
    bet_mask = input_df['bet_score'] == 2
    home_mask = input_df['home_goals_bet'] == input_df['home_goals_actual']
    away_mask = input_df['away_goals_bet'] == input_df['away_goals_actual']
    input_df['goal_exact_estimation'][(bet_mask) & (home_mask) & (away_mask)] = 1 # correct result
    input_df['bet_score'][(bet_mask)  & (home_mask) & (away_mask)] = 3 # bet score for a correct result
    return input_df

combi_df = calculate_bet_score(combi_df)

# now look at the results
combi_df

# and you can do nifty stuff like making a top player list like this:
combi_df.groupby('playerid')['bet_score'].sum().order(ascending=False)
# player 4 is way ahead!
# which game was the best estimated game?
combi_df.groupby('game')['bet_score'].mean().order(ascending=False)
# game 3! though abysmal predictions in general ;) 

正如我所说的,它主要是为了给出一个关于Python中数据操作可能性的不同观点/想法。一旦你开始认真处理大量的数据,这种(基于向量/数字/熊猫)的方法将是最快的,但是你必须问问自己,你想在数据库内和数据库之外做什么逻辑,等等。

希望这能帮上忙!

票数 1
EN

Stack Overflow用户

发布于 2013-12-30 09:37:28

另一个答案,这反映了我对优雅的看法(我同意这是一个相当主观的参数)。我希望用类来定义对象,类的构建考虑到了OOP,并且使用了一个管理对象之间关系的奥姆。这带来了许多优点和更清晰的代码。

我在这里使用小马ORM,但是还有许多其他很好的选项(最终具有更宽松的许可),比如SQLAlchemyDjango's ORM

这里是一个完整的示例-首先我们定义模型:

代码语言:javascript
复制
from pony.orm import *

class Player(db.Entity):
    """A player is somebody who place a bet, identified by its name."""
    name = Required(unicode)
    score = Required(int, default=0)
    bets = Set('Bet', reverse='player')
    # any other player's info can be stored here


class Match(db.Entity):
    """A Match is a game, played or not yet played."""

    ended = Required(bool, default=False)
    home_score = Required(int, default=0)
    visitors_score = Required(int, default=0)

    bets = Set('Bet', reverse='match')


class Bet(db.Entity):
    """A class that stores a bet for a specific game"""

    match = Required(Match, reverse="bets")
    home_score = Required(int, default=0)
    visitors_score = Required(int, default=0)
    player = Required(Player, reverse="bets")

@db_session
def calculate_wins(match):
    bets = select(b for b in Bet if b.match == match)[:]
    for bet in bets:
        if (match.home_score == bet.home_score) and (match.visitors_score == bet.visitors_score):
            bet.player.score += 3  # exact
        elif (match.home_score - match.visitors_score) == (bet.home_score - bet.visitors_score):
            bet.player.score += 2  # goal differences
        elif ((match.home_score > match.visitors_score) == (bet.home_score > bet.visitors_score)) and \
           (match.home_score != match.visitors_score) and (bet.home_score != bet.visitors_score):
            bet.player.score += 1  # tendency
        else:
            bet.player.score += 0  # wrong

使用这些类,您可以创建和更新您的比赛,球员,下注数据库。如果需要统计数据和数据聚合/排序,可以根据需要查询数据库。

代码语言:javascript
复制
db = Database('sqlite', ':memory:')  # you may store it on a file if you like
db.generate_mapping(create_tables=True)

player1 = Player(name='furins')
player2 = Player(name='Martin')

match1 = Match()

furins_bet = Bet(match=match1, player=player1, home_score=0, visitors_score=0)
martin_bet = Bet(match=match1, player=player2, home_score=3, visitors_score=0)


# the game begins ...
match1.home_score = 1
match1.visitors_score = 0
# the game ended ...
match1.ended = True

commit() #let's update the database


calculate_wins(match1)

print("furins score: %d"%(player1.score)) # returns 0
print("Martin score: %d"%(player2.score)) # returns 1

您甚至可以像Carst建议的那样,使用numpy集成非常复杂的时间序列数据分析,但我认为这些添加的-albeit非常有趣--与您最初的问题不太一样。

票数 2
EN

Stack Overflow用户

发布于 2013-12-29 19:11:46

以下是一个完整但不太优雅的解决方案:

代码语言:javascript
复制
def evaluation(team_home, team_away, estimate_home, estimate_away):
    delta_result = team_home - team_away
    delta_estimate = estimate_home - estimate_away

    if delta_result == delta_estimate:
        if team_home != estimate_home:
            print "goal difference"
        else:
            print "right"
    elif delta_result > 0 and delta_estimate > 0:
        print "tendency"
    elif delta_result < 0 and delta_estimate < 0:
        print "tendency"
    else:
        print "wrong"

evaluation(2, 1, 2, 1)  # right
evaluation(2, 1, 1, 0)  # goal difference
evaluation(2, 1, 3, 0)  # tendency
evaluation(2, 1, 0, 0)  # wrong

evaluation(2, 2, 2, 2)  # right
evaluation(2, 2, 1, 1)  # goal difference
evaluation(2, 2, 0, 0)  # goal difference
evaluation(2, 2, 1, 0)  # wrong

evaluation(0, 1, 0, 1)  # right
evaluation(0, 1, 1, 2)  # goal difference
evaluation(0, 1, 0, 2)  # tendency
evaluation(0, 1, 0, 0)  # wrong
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/20828856

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档