首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Rvest擦伤元件

Rvest擦伤元件
EN

Stack Overflow用户
提问于 2022-02-15 20:26:11
回答 1查看 30关注 0票数 1

我正试着在这个页面上为一支球队刮记录(3-6-2)和一年:https://www.pro-football-reference.com/teams/pit/1933.htm

我试着使用选择器小工具来提取正确的xpath或类,但是没有什么是正确的。我得到的最接近的记录是:

代码语言:javascript
复制
read_html(
  curl("https://www.pro-football-reference.com/teams/pit/1933.htm", 
                          handle = curl::new_handle("useragent" = "Mozilla/5.0"))) %>% 
  html_element(xpath='//*[@id="meta"]/div[2]/p[1]/strong') %>% 
  html_text()

我希望输出是一个数据框架。对于如何在选择器小工具中访问这个元素的任何清晰性都会有帮助,因为我试图学习从这个和其他类似的页面中提取其他元素。谢谢!

EN

回答 1

Stack Overflow用户

发布于 2022-02-15 20:44:33

如果您只需要查找表,那么rvest的html_table函数就可以实现您想要的功能。

代码语言:javascript
复制
html_table(read_html("https://www.pro-football-reference.com/teams/pit/1933.htm"))
代码语言:javascript
复制
[[1]]
# A tibble: 5 x 23
  ``     ``    ``    `Tot Yds & TO` `Tot Yds & TO` `Tot Yds & TO` ``    ``    Passing Passing
  <chr>  <chr> <chr> <chr>          <chr>          <chr>          <chr> <chr> <chr>   <chr>  
1 Player PF    Yds   "Ply"          "Y/P"          TO             FL    "1st~ "Cmp"   Att    
2 Team ~ 67    1943  "534"          "3.6"          40             0     ""    "60"    196    
3 Opp. ~ 208   2735  "583"          "4.7"          19             0     ""    "57"    142    
4 Lg Ra~ 8     8     ""             ""             9              1     "1"   ""      1      
5 Lg Ra~ 10    9     ""             ""             9              1     "1"   ""      2      
# ... with 13 more variables: Passing <chr>, Passing <chr>, Passing <chr>, Passing <chr>,
#   Passing <chr>, Rushing <chr>, Rushing <chr>, Rushing <chr>, Rushing <chr>,
#   Rushing <chr>, Penalties <chr>, Penalties <chr>, Penalties <chr>

[[2]]
# A tibble: 12 x 22
   ``    ``    ``       ``    ``    ``    ``    ``    ``    ``    Score Score Offense Offense
   <chr> <chr> <chr>    <lgl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>   <chr>  
 1 Week  Day   Date     NA    ""    ""    "OT"  Rec   ""    Opp   Tm    Opp   "1stD"  "TotYd"
 2 1     Wed   Septemb~ NA    "box~ "L"   ""    0-1   ""    New ~ 2     23    ""      ""     
 3 2     Wed   Septemb~ NA    "box~ "W"   ""    1-1   ""    Chic~ 14    13    ""      ""     
 4 3     Wed   October~ NA    "box~ "L"   ""    1-2   ""    Bost~ 6     21    ""      ""     
 5 4     Wed   October~ NA    "box~ "W"   ""    2-2   ""    Cinc~ 17    3     ""      ""     
 6 5     Sun   October~ NA    "box~ "L"   ""    2-3   "@"   Gree~ 0     47    ""      ""     
 7 6     Sun   October~ NA    "box~ "T"   ""    2-3-1 "@"   Cinc~ 0     0     ""      ""     
 8 7     Sun   October~ NA    "box~ "W"   ""    3-3-1 "@"   Bost~ 16    14    ""      ""     
 9 8     Sun   Novembe~ NA    "box~ "T"   ""    3-3-2 "@"   Broo~ 3     3     ""      ""     
10 9     Sun   Novembe~ NA    "box~ "L"   ""    3-4-2 ""    Broo~ 0     32    ""      ""     
11 10    Sun   Novembe~ NA    "box~ "L"   ""    3-5-2 "@"   Phil~ 6     25    ""      ""     
12 12    Sun   Decembe~ NA    "box~ "L"   ""    3-6-2 "@"   New ~ 3     27    ""      ""     
# ... with 8 more variables: Offense <chr>, Offense <chr>, Offense <chr>, Defense <chr>,
#   Defense <chr>, Defense <chr>, Defense <chr>, Defense <chr>

然后你可以索引和过滤得到你想要的值。

如果您希望避免解析表,可以直接使用

代码语言:javascript
复制
read_html("https://www.pro-football-reference.com/teams/pit/1933.htm") %>%
  html_elements(xpath = "//td[@data-stat='team_record']") %>%
  html_text()

它将从该列中提取所有值,然后您可以获取最后一个值。

代码语言:javascript
复制
 [1] "0-1"   "1-1"   "1-2"   "2-2"   "2-3"   "2-3-1" "3-3-1" "3-3-2" "3-4-2" "3-5-2" "3-6-2"
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71132948

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档