首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >无法获取附加到不同参与者的表格内容

无法获取附加到不同参与者的表格内容
EN

Stack Overflow用户
提问于 2019-08-31 12:06:58
回答 3查看 250关注 0票数 3

我试图从网页中获取与不同参与者相关的表格的内容。为了你的理解,我想要的信息已经在图像中被删除了。目前,我的脚本只能给出不同参与者的名字。我也要分析与这些与会者有关的信息。

https://www.bet365.com.au/#/AC/B151/C1/D50/E2/F163/

由于内容是动态的,我不得不使用一些可以使用dev工具检索的公共API。

https://filebin.net/ybwver7vt5mp1dju表示信息在该页面中的显示方式。我想抓住的就是划过线的人。

https://pastebin.com/QsA3Pprr API响应的样子。

我已经试过了:

代码语言:javascript
复制
import re
import requests

url = 'https://www.bet365.com.au/SportsBook.API/web?'

params = {
    'lid': '30',
    'zid': '0',
    'pd': '#AC#B151#C1#D50#E2#F163#',
    'cid': '13',
    'ctid': '13'
}

r = requests.get(url, params=params,headers={'User-Agent':'Mozilla/5.0'})
games = re.finditer(r'NA=(.*?);', r.text)
for game in games:
    if not 'v' in game.group(): continue
    print(game.group(1))

我得到的输出类似于(部分):

代码语言:javascript
复制
FunPlus Phoenix v Bilibili Gaming
Top Esports v Royal Never Give Up
Moops v Brute
eSuba v eXtatus
CS:GO - V4 Future Sports Festival
PACT v Capri Sun

我希望得到类似于(部分)的输出:

代码语言:javascript
复制
26:42    FunPlus Phoenix v Bilibili Gaming    1-1   -      -      21
09:00    Top Esports v Royal Never Give Up     -    2.00   1.72   49
12:00    Moops v Brute                         -    2.10   1.66   17

我怎样才能抓取附在不同参与者身上的表格内容?

PS这里可见的信息可能并不相同,因为它们每隔几分钟更新一次,我希望像我已经尝试过的那样使用请求来完成任务。

EN

回答 3

Stack Overflow用户

发布于 2019-09-03 16:12:41

我帮你做了你的第一个问题的代码,这与这个网站有关。虽然其他两个答案使用Selenium,但这是不必要的,因为网站的api端点用于查找游戏。这种方法应该比硒更快。我能够再次使用正则表达式解析其他信息。然而,在实际的网站上,我无法找到任何类似的'1-1‘,如您的预期输出。希望这能有所帮助。“时代”可能有问题,我对他们不太确定。

代码

代码语言:javascript
复制
import re
import requests
from datetime import datetime, timedelta
import pandas as pd

url = 'https://www.bet365.com.au/SportsBook.API/web?'

params = {
    'lid': '30',
    'zid': '0',
    'pd': '#AC#B151#C1#D50#E2#F163#',
    'cid': '13',
    'ctid': '13'
}

r = requests.get(url, params=params, headers={'User-Agent': 'Mozilla/5.0'})

games = re.finditer(r'NA=([\w\s\-._]+? v [\w\s\-._]+?);', r.text)
col_games = []
for game in games:
    # if 'v' in game.group() and '-' not in game.group():
    col_games.append(game.group(1))

prices_text = re.finditer(r'NA=1;.*?((?:OD=\d+/\d+;(?:.*?))+?)NA=', r.text)
col_1 = []
for text in prices_text:
    segments = text.group(1).split('|')
    for segment in segments:
        price = re.search(r'OD=(\d+/\d+);', segment)
        if price:
            price = int(eval(price.group(1) + '+1') * 100) / 100
            col_1.append(price)

prices_text = re.finditer(r'NA=2;.*?((?:OD=\d+/\d+;(?:.*?))+?)NA=', r.text)
col_2 = []
for text in prices_text:
    segments = text.group(1).split('|')
    for segment in segments:
        price = re.search(r'OD=(\d+/\d+);', segment)
        if price:
            price = int(eval(price.group(1) + '+1') * 100) / 100
            col_2.append(price)

times = re.finditer(r'BC=(\d+);', r.text)
col_times = []
for time in times:
    datetime_time = datetime.strptime(time.group(1)[:-2], '%Y%m%d%H%M')
    datetime_time = datetime_time + timedelta(hours=-1)
    col_times.append(datetime_time.time())


df = pd.DataFrame({'Time': col_times, "Games": col_games, "1": col_1, "2": col_2})
print(df)

输出

代码语言:javascript
复制
        Time                                           Games     1     2
0   19:00:00                                 DETONA v Falkol  1.25  3.75
1   19:00:00                              paiN Gaming v Keyd  1.53  2.37
2   19:00:00                                 W7M v Bulldozer  1.22  4.00
3   03:00:00                       VP Game v Team WE Academy  2.62  1.44
4   05:00:00  Invictus Gaming Young v Top Esports Challenger  1.22  4.00
5   07:00:00   Vici Gaming Potential v FunPlus Phoenix Blaze  1.36  3.00
6   09:00:00    Edward Gaming Youth v Bilibili Gaming Junior  2.00  1.72
7   09:00:00                    Gama Dream v LinGan e-Sports  1.80  1.90
8   03:00:00                    Royal Club v Suning Gaming-S  1.66  2.10
9   05:00:00                         Joy Dream v Oh My Dream  2.37  1.53
10  07:00:00            LNG Academy v Bilibili Gaming Junior  3.25  1.33
11  07:00:00                   TS Gaming v Victorious Gaming  1.72  2.00
12  09:00:00         D7G Esports Club v Legend Esport Gaming  3.75  1.25
13  09:00:00        Dominus Esports.Y v Rogue Warriors Shark  2.50  1.50
14  05:00:00         Team WE Academy v Vici Gaming Potential  3.25  1.33
15  07:00:00                                 87 v Gama Dream  2.00  1.72
16  07:00:00             Invictus Gaming Young v LNG Academy  1.16  4.50
17  09:00:00                 FunPlus Phoenix Blaze v VP Game  1.50  2.50
18  09:00:00                   Scorpio Game v Young Miracles  3.40  1.30
19  09:00:00                   Top Esports v Bilibili Gaming  1.53  2.37
20  08:00:00           FunPlus Phoenix v Royal Never Give Up  1.57  2.25
21  09:30:00                                    Maru v Solar  1.40  2.75
22  10:15:00                                   Stats v Rogue  1.57  2.25
23  04:00:00                              Classic v RagnaroK  1.22  4.00
24  04:45:00                                     Dear v Zest  2.62  1.44
25  08:00:00               SANDBOX Gaming v KINGZONE DragonX  1.66  2.10
26  13:00:00                                ENCE v Renegades  1.25  3.75
27  16:30:00                         Team Vitality v AVANGAR  1.22  4.00
28  13:00:00                             NRG v Natus Vincere  1.66  2.10
29  16:30:00                          Astralis v Team Liquid  2.00  1.72
30  23:00:00                Vancouver Titans v Seoul Dynasty  1.33  3.25
31  02:00:00         Hangzhou Spark v Los Angeles Gladiators  1.72  2.00
32  08:00:00                                MAD Team v G-Rex  1.53  2.37
33  08:00:00               Flash Wolves v Hong Kong Attitude  3.25  1.33
34  19:00:00                        Clutch Gaming v FlyQuest  1.25  3.75
35  16:00:00                                 Flamengo v INTZ  1.16  4.50
36  16:00:00                             Fnatic v Schalke 04  1.20  4.33
37  16:00:00                                 Origen v Splyce  3.50  1.28
38  09:00:00                        GAM Esports v Team Flash  1.25  3.75
票数 3
EN

Stack Overflow用户

发布于 2019-08-31 15:39:14

您可以使用selenium

代码语言:javascript
复制
from selenium import webdriver
from bs4 import BeautifulSoup as soup
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://www.bet365.com.au/#/AC/B151/C1/D50/E2/F163/')
def scrape_block(b):
   p = {'date':b.find('div', {'class':'gll-MarketColumnHeader sl-MarketHeaderLabel sl-MarketHeaderLabel_Date '}).text}
   c1, c2 = b.find_all('div', {'class':'sl-CouponParticipantWithBookCloses sl-CouponParticipantWithBookCloses_NoAdditionalMarkets sl-CouponParticipantIPPGBase '}), b.find_all('div', {'class':'sl-CouponParticipantWithBookCloses sl-CouponParticipantWithBookCloses_NoAdditionalMarkets sl-CouponParticipantIPPGBase sl-CouponParticipantWithBookCloses_ClockPaddingLeft '})
   if c1:
      pl = [[i.find('div', {'class':'sl-CouponParticipantWithBookCloses_BookCloses '}).text, i.find('div', {'class':'sl-CouponParticipantWithBookCloses_Name '}).text] for i in c1] 
   else:
      pl = [[i.find('div', {'class':'pi-CouponParticipantClockInPlay '}).text, i.find('div', {'class':'sl-CouponParticipantWithBookCloses_Name '}).text, i.find('div', {'class':'pi-ScoreVariantDefault '}).text] for i in c2]
   odds1, odds2 = [[i.text for i in c.find_all('div', {'class':'gll-ParticipantOddsOnlyDarker gll-Participant_General gll-ParticipantOddsOnly '})] for c in b.find_all('div', {'class':'sl-MarketCouponValuesExplicit2 gll-Market_General gll-Market_PWidth-15-4 '})]
   return {**p, 'data':[{'player':a, 1:b, 2:c} for a, b, c in zip(pl, [None] if not odds1 else odds1, [None] if not odds2 else odds2)]}

new_d = list(map(scrape_block, soup(d.page_source, 'html.parser').find_all('div', {'class':'gll-MarketGroupContainer gll-MarketGroupContainer_HasLabels '})))
final_result = list(filter(lambda x:bool(x['data']), new_d))

输出:

代码语言:javascript
复制
[{'date': 'Sat 31 Aug', 'data': [{'player': ['22:42', 'Royal Youth v SuperMassive', '1-2'], 1: None, 2: None}]}, {'date': 'Sun 01 Sep', 'data': [{'player': ['11:56', 'G2 Esports v Fnatic', '0-0'], 1: None, 2: None}]}, {'date': 'Sun 01 Sep', 'data': [{'player': ['01:20', 'Hjarnan (G2) v h$hjukken'], 1: '1.10', 2: '1.10'}]}, {'date': 'Sun 01 Sep', 'data': [{'player': ['02:00', 'Thijs v Kolento'], 1: '1.83', 2: '1.83'}, {'player': ['03:00', 'Orange v Hunterace'], 1: '2.25', 2: '1.57'}, {'player': ['04:00', 'Gallon v StrifeCro'], 1: '2.00', 2: '1.72'}, {'player': ['04:00', 'Rdu v SilverName'], 1: '2.00', 2: '1.72'}, {'player': ['05:00', 'Monsanto v PNC'], 1: '1.61', 2: '2.20'}, {'player': ['06:00', 'bloodyface v Amnesiac'], 1: '1.80', 2: '1.90'}, {'player': ['07:00', 'Eddie v Purple'], 1: '1.80', 2: '1.90'}, {'player': ['08:00', 'muzzy v Firebat'], 1: '1.72', 2: '2.00'}, {'player': ['09:00', 'ETC v Nalguidan'], 1: '2.10', 2: '1.66'}]}, {'date': 'Sun 01 Sep', 'data': [{'player': ['12:00', 'Mindfreak v ORDER'], 1: '1.53', 2: '2.37'}]}, {'date': 'Sun 01 Sep', 'data': [{'player': ['15:00', 'LinGan e-Sports v Bilibili Gaming Junior'], 1: '1.66', 2: '2.10'}, {'player': ['17:00', 'Scorpio Game v Suning Gaming-S'], 1: '3.00', 2: '1.36'}, {'player': ['17:00', 'Victorious Gaming v FunPlus Phoenix Blaze'], 1: '3.00', 2: '1.36'}, {'player': ['19:00', '87 v Top Esports Challenger'], 1: '1.66', 2: '2.10'}, {'player': ['19:00', 'Rogue Warriors Shark v Legend Esport Gaming'], 1: '2.62', 2: '1.44'}]}]
票数 1
EN

Stack Overflow用户

发布于 2019-09-01 17:05:10

如果要使用JS API,您需要找到一种方法来了解如何解码网站的输出,以及JS部分如何在实际网站中呈现我们可以看到的内容。我觉得这不是一件容易的事。这就是为什么我建议您使用SeleniumBeautifulSoup在浏览器选项卡下加载网站,然后使用Beautifulsoup使用最终的HTML,这将降低从网站中提取什么内容的复杂性。

下面是一个如何使用tournments, dates and matchesheadless模式刮除Chrome的示例

PS: cookie部分并不是必需的,但它将有助于自动加载我们正在尝试刮取的页面。

首先您需要安装:pip install webdriver-manager,然后:

代码语言:javascript
复制
import pickle
import time
from collections import defaultdict
from pprint import pprint
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup as bs

CHROME_OPTIONS = Options()
CHROME_OPTIONS.add_argument("--headless")

class Bet365:
    DRIVER = webdriver.Chrome(ChromeDriverManager().install(), options=CHROME_OPTIONS)
    DUMMY_URL = 'https://www.bet365.com'
    URL = 'https://www.bet365.com/#/AC/B1/C1/D13/E37628398/F2/:/AC/B1/C1/D13/E42294995/F2/:/AC/B1/C1/D13/E42535433/F2/'
    COOKIES_FILE = 'cookies.pkl'

    def __init__(self):
        self.DRIVER.get(self.DUMMY_URL)
        # Comment the next line if cookies file is not set
        self.setup_cookies()
        self.DRIVER.get(self.URL)
        # self.DRIVER.maximize_window()
        # Wait for JS to populate the page
        time.sleep(15)
        self.source = self.DRIVER.page_source
        # Store new cookies for next run
        self.dump_cookies()

    def dump_cookies(self):
        """Store cookies"""
        pickle.dump(self.DRIVER.get_cookies(), open(self.COOKIES_FILE, "wb"))

    def setup_cookies(self):
        """Add cookies"""
        cookies = pickle.load(open(self.COOKIES_FILE, "rb"))
        for cookie in cookies:
            if 'expiry' in cookie:
                del cookie['expiry']
            self.DRIVER.add_cookie(cookie)

    def get_source(self):
        """Get page HTML source"""
        return bs(self.source, "html.parser")

    def is_last_child(self, event):
        """Is last child"""
        out = {}
        out['last_child'] = True if 'sl-MarketCouponAdvancedBase_LastChild' in event['class'] else False
        event_date = event.find('div', {'class': 'sl-CouponParticipantWithBookCloses_BookCloses'})
        out['date'] = event_date.get_text() if event_date else 'None'
        teams = event.findAll('div', {'class': 'sl-CouponParticipantWithBookCloses_Name'})
        if len(teams) > 1:
            out['teams'] = ' v '.join(k.text for k in teams)
        elif len(teams) == 1:
            out['teams'] = teams[0].text
        else:
            out['teams'] = 'None'
        return out

    def get_events(self, data):
        """Return all events"""
        dates, teams = [], []
        for event in data.findAll('div', {'class': 'sl-MarketCouponFixtureLabelBase gll-Market_General gll-Market_HasLabels'}):
            dates = [elm.text for elm in event.find_all('div', {'class': lambda x: all(k in x for k in 'gll-MarketColumnHeader sl-MarketHeaderLabel sl-MarketHeaderLabel_Date'.split())})]
            teams_events = event.findAll("div", {'class': lambda x: x and x.startswith("sl-CouponParticipantWithBookCloses sl-CouponParticipantIPPGBase")})
            teams = [self.is_last_child(elm) for elm in teams_events]
            if len(dates) == 1:
                if teams:
                    teams[-1]['last_child'] = True
        return dates, teams

    def pretty_print_events(self, dates, teams):
        """Pretty print events"""
        def groupby_last_child(data):
            out, tmp = [], []
            for elm in data:
                tmp.append(elm)
                if elm['last_child']:
                    out.append(tmp)
                    tmp = []
            return out

        out = defaultdict(list)
        for date, groupped in zip(dates, groupby_last_child(teams)):
            # use += instead of append in order to have flatten list
            # instead of list of lists
            out[date] += groupped
        return dict(out)

    def scrape_events(self):
        """Return all ligues"""
        for block in self.get_source().findAll('div', {'class': 'gll-MarketGroup cm-CouponMarketGroup cm-CouponMarketGroup_Open'}):
            ligue_name = block.find('span', {'class': 'cm-CouponMarketGroupButton_Text'}).get_text()
            dates, teams = self.get_events(block)
            out = self.pretty_print_events(dates, teams)
            yield ligue_name, out

    def to_dict(self):
        """Scrape events and return a dict"""
        return dict((ligue, events) for ligue, events in self.scrape_events())


if __name__ == '__main__':
    instance = Bet365()
    out = instance.to_dict()
    pprint(out)

输出:

代码语言:javascript
复制
{'England League 2 - Full Time Result': {'Sat 07 Sep': [{'date': '15:00',
                                                         'last_child': False,
                                                         'teams': 'Bradford v '
                                                                  'Northampton'},
                                                        {'date': '15:00',
                                                         'last_child': False,
                                                         'teams': 'Cambridge '
                                                                  'Utd v '
                                                                  'Forest '
                                                                  'Green'},
                                                        {'date': '15:00',
                                                         'last_child': False,
                                                         'teams': 'Carlisle v '
                                                                  'Exeter'},
                                                        {'date': '15:00',
                                                         'last_child': False,
                                                         'teams': 'Cheltenham '
                                                                  'v '
                                                                  'Stevenage'},
                                                        {'date': '15:00',
                                                         'last_child': False,
                                                         'teams': 'Colchester '
                                                                  'v Walsall'},
                                                        {'date': '15:00',
                                                         'last_child': False,
                                                         'teams': 'Grimsby v '
                                                                  'Crewe'},
                                                        {'date': '15:00',
                                                         'last_child': False,
                                                         'teams': 'Leyton '
                                                                  'Orient v '
                                                                  'Swindon'},
                                                        {'date': '15:00',
                                                         'last_child': False,
                                                         'teams': 'Macclesfield '
                                                                  'v Crawley '
                                                                  'Town'},
                                                        {'date': '15:00',
                                                         'last_child': False,
                                                         'teams': 'Mansfield v '
                                                                  'Scunthorpe'},
                                                        {'date': '15:00',
                                                         'last_child': False,
                                                         'teams': 'Morecambe v '
                                                                  'Salford '
                                                                  'City'},
                                                        {'date': '15:00',
                                                         'last_child': False,
                                                         'teams': 'Newport '
                                                                  'County v '
                                                                  'Port Vale'},
                                                        {'date': '15:00',
                                                         'last_child': True,
                                                         'teams': 'Plymouth v '
                                                                  'Oldham'}]},...
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57737881

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档