文章/答案/技术大牛

发布

社区首页 >问答首页 >从html页面抓取网页

问从html页面抓取网页
EN

Stack Overflow用户

提问于 2017-10-09 15:08:44

回答 1查看 239关注 0票数 1

我正试着从这一页中找出赌博公司的赔率：

https://www.interwetten.com/en/sportsbook/top-leagues?topLinkId=1

因此，到目前为止，我编写了以下代码

interwetten <- read_html("https://www.interwetten.com/en/sportsbook/top-leagues?topLinkId=1") 
bundesliga <- html_nodes(interwetten, xpath = '//*[@id="TBL_Content_1019"]')  
bundesliga_teams <- html_nodes(bundesliga, "span")

现在我得到的输出是：

[1] <span id="ctl00_cphMain_UCOffer_LeagueList_rptLeague_ctl00_ucBettingContainer_lblClose" clas ...
[2] <span itemscope="itemscope" itemprop="location" itemtype="http://schema.org/Place"><meta ite ...
[3] <span itemprop="name">VfB Stuttgart</span>
[4] <span>X</span>

现在，我想在每个 <span itemprop="name"></span>中提取团队名称，但我不知道如何提取它。我试图使用节点或吸引，但它没有工作。

web-scraping

html-parsing

rvest

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-10-09 15:30:06

您可以使XPath选择器更加具体，然后使用html_text。

library(rvest)

interwetten <- 'https://www.interwetten.com/en/sportsbook/top-leagues?topLinkId=1' %>% 
    read_html() 

teams <- interwetten %>% 
    html_nodes(xpath = '//*[@id="TBL_Content_1019"]//span[@itemprop="name"]') %>% 
    html_text()

teams
#>  [1] "VfB Stuttgart"   "1. FC Cologne"   "Mainz 05"       
#>  [4] "Hamburger SV"    "Hertha BSC"      "Schalke 04"     
#>  [7] "Hannover 96"     "Frankfurt"       "Hoffenheim"     
#> [10] "Augsburg"        "Bayern Munich"   "Freiburg"       
#> [13] "Dortmund"        "RB Leipzig"      "Leverkusen"     
#> [16] "Wolfsburg"       "Werder Bremen"   "Monchengladbach"

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/46649623

复制

相似问题

问从html页面抓取网页
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从html页面抓取网页EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从html页面抓取网页
EN