首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在网页上循环遍历不同的选择器以存储在一个大DF中。

在网页上循环遍历不同的选择器以存储在一个大DF中。
EN

Stack Overflow用户
提问于 2022-10-12 18:56:48
回答 1查看 37关注 0票数 1

今天我带着一个关于这个项目的问题来了,这个问题很快就被回答了,所以我再次来到这里。下面的代码通过提供的网站进行抓取,提取数据,并为它正在抓取的表的实例添加一列。我面临的下一场战斗是将所有的Game实例加载到big_df中,其中包含一个列,以复制当前正在进行的游戏近况。如果有人能帮我完成最后一块拼图,我会很感激的。

https://www.fantasypros.com/daily-fantasy/nba/fanduel-defense-vs-position.php

代码语言:javascript
复制
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
import pandas as pd 

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
big_df = pd.DataFrame()
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")

webdriver_service = Service(r'chromedriver\chromedriver') ## path to where you saved chromedriver binary
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(driver, 20)
url = "https://www.fantasypros.com/daily-fantasy/nba/fanduel-defense-vs-position.php"
driver.get(url)
sleep(60)


tables_list = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//ul[@class="pills pos-filter pull-left"]/li')))

for x in tables_list:
    x.click()
    print('selected', x.text)
    t.sleep(2)
    table = wait.until(EC.element_to_be_clickable((By.XPATH, '//table[@id="data-table"]')))
    df = pd.read_html(table.get_attribute('outerHTML'))[0]
    df['Category'] = x.text.strip()
    big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
    print('done, moving to next table')
print(big_df)
big_df.to_csv('fanduel.csv')
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-10-12 19:11:34

这样你才能实现你的最终目标:

代码语言:javascript
复制
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time as t
import pandas as pd 

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
big_df = pd.DataFrame()
chrome_options = Options()
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument('disable-notifications')
chrome_options.add_argument("window-size=1280,720")

webdriver_service = Service("chromedriver/chromedriver") ## path to where you saved chromedriver binary
driver = webdriver.Chrome(service=webdriver_service, options=chrome_options)
wait = WebDriverWait(driver, 20)
url = "https://www.fantasypros.com/daily-fantasy/nba/fanduel-defense-vs-position.php"
driver.get(url)

select_recency_options = [x.text for x in wait.until(EC.presence_of_all_elements_located((By.XPATH, '//select[@class="game-change"]/option')))]
for option in select_recency_options:
    select_recency = Select(WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//select[@class="game-change"]'))))
    select_recency.select_by_visible_text(option)
    print('selected', option)
    t.sleep(2)

    tables_list = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//ul[@class="pills pos-filter pull-left"]/li')))

    for x in tables_list:
        x.click()
        print('selected', x.text)
        t.sleep(2)
        table = wait.until(EC.element_to_be_clickable((By.XPATH, '//table[@id="data-table"]')))
        df = pd.read_html(table.get_attribute('outerHTML'))[0]
        df['Category'] = x.text.strip()
        df['Recency'] = option
        big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
        print('done, moving to next table')
display(big_df)
big_df.to_csv('fanduel.csv')

其结果是(更大的)数据:

代码语言:javascript
复制
    Team    PTS REB AST 3PM STL BLK TO  FD PTS  Category    Recency
0   HOUHouston Rockets  23.54   9.10    5.10    2.54    1.88    1.15    2.65    48.55   ALL Season
1   OKCOklahoma City Thunder    22.22   9.61    5.19    2.70    1.67    1.18    2.52    47.57   ALL Season
2   PORPortland Trail Blazers   22.96   8.92    5.31    2.74    1.63    0.99    2.65    46.84   ALL Season
3   SACSacramento Kings 23.00   9.10    5.03    2.58    1.61    0.95    2.50    46.65   ALL Season
4   ORLOrlando Magic    22.35   9.39    4.94    2.62    1.57    1.04    2.50    46.36   ALL Season
... ... ... ... ... ... ... ... ... ... ... ...
715 TORToronto Raptors  23.33   13.97   2.77    0.57    0.84    1.88    3.38    49.03   C   Last 30
716 NYKNew York Knicks  19.78   15.40   2.94    0.53    0.90    1.92    2.17    48.96   C   Last 30
717 BKNBrooklyn Nets    19.69   13.60   3.16    0.86    1.10    2.25    2.06    48.74   C   Last 30
718 BOSBoston Celtics   17.79   11.95   3.75    0.41    1.66    1.80    2.54    45.60   C   Last 30
719 MIAMiami Heat   17.41   14.19   2.16    0.50    1.01    1.52    1.75    43.52   C   Last 30
720 rows × 11 columns
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74046780

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档