我目前正在编写一个函数,通过从Pitchfork获得专辑的评论和评级,并删除HTML。结果应该是一个包含两个元素的列表:该专辑的评论和分数。到目前为止,我还在考虑返回什么、HTML和paste0函数的正则表达式。谢谢您抽时间见我!
pitchfork = function(url){
save = getURL(url)
cat(save,file = "review.txt")
a1 = '<div class="contents dropcap"><p>'
b1 = str_replace(save, paste0("^.*",a1),"")
a2 = '</div><a class="end-mark-container" href="/">'
b2 = str_replace(b1, paste0(a2,".*$"),"")
}发布于 2020-03-04 04:48:41
像这样的怎么样?
library(xml2)
library(rvest)
library(tidyverse)
url <- "http://pitchfork.com/reviews/albums/grimes-miss-anthropocene"
html <- read_html(url)
review <- html %>%
xml_nodes("p") %>%
html_text() %>%
enframe("paragraph_no", "text")
review
## A tibble: 14 x 2
# paragraph_no text
# <int> <chr>
# 1 1 Best new music
# 2 2 Grimes’ first project as a bona fide pop star is more morose th…
# 3 3 In 2011, Grimes was eager to say in an interview that she had “…
# 4 4 Miss Anthropocene is Grimes’ fifth album and her first as that …
# 5 5 The result is a record that’s more morose than her previous wor…
# 6 6 In November 2018, Grimes released “We Appreciate Power,” a coll…
# 7 7 When Grimes veers away from high concept toward examining intim…
# 8 8 Miss Anthropocene thrills when it reveals a refined, linear evo…
# 9 9 So much about the actual music of Miss Anthropocene succeeds th…
#10 10 And that’s the obstacle, the slimy mouthfeel, standing in the w…
#11 11 Correction: An earlier version of this review erroneously state…
#12 12 Listen to our Best New Music playlist on Spotify and Apple Musi…
#13 13 Buy: Rough Trade
#14 14 (Pitchfork may earn a commission from purchases made through af…review是一个tibble,包含按段落分割的评审;它可能需要一些额外的清理(比如删除第一行和最后一行)。
对于分数,我们可以使用类属性选择器。
score <- html %>% xml_nodes("[class='score']") %>% html_text() %>% as.numeric()
score
#[1] 8.2(在一个函数中)把东西包装起来
让我们将所有内容封装在一个function中,它返回一个list,其中包含复习tibble和数字分数。
get_pitchfork_data <- function(url) {
html <- read_html(url)
list(
review = html %>%
xml_nodes("p") %>%
html_text() %>%
trimws() %>%
enframe("paragraph_no", "text"),
score = html %>%
xml_nodes("[class='score']") %>%
html_text() %>%
as.numeric())
}试验1:
get_pitchfork_data("http://pitchfork.com/reviews/albums/grimes-miss-anthropocene")
#$review
## A tibble: 14 x 2
# paragraph_no text
# <int> <chr>
# 1 1 Best new music
# 2 2 Grimes’ first project as a bona fide pop star is more morose th…
# 3 3 In 2011, Grimes was eager to say in an interview that she had “…
# 4 4 Miss Anthropocene is Grimes’ fifth album and her first as that …
# 5 5 The result is a record that’s more morose than her previous wor…
# 6 6 In November 2018, Grimes released “We Appreciate Power,” a coll…
# 7 7 When Grimes veers away from high concept toward examining intim…
# 8 8 Miss Anthropocene thrills when it reveals a refined, linear evo…
# 9 9 So much about the actual music of Miss Anthropocene succeeds th…
#10 10 And that’s the obstacle, the slimy mouthfeel, standing in the w…
#11 11 Correction: An earlier version of this review erroneously state…
#12 12 Listen to our Best New Music playlist on Spotify and Apple Musi…
#13 13 Buy: Rough Trade
#14 14 (Pitchfork may earn a commission from purchases made through af…
#
#$score
#[1] 8.2试验2:
get_pitchfork_data("https://pitchfork.com/reviews/albums/radiohead-ok-computer-oknotok-1997-2017/")
#$review
## A tibble: 12 x 2
# paragraph_no text
# <int> <chr>
# 1 1 Best new reissue
# 2 2 Twenty years on, Radiohead revisit their 1997 masterpiece with …
# 3 3 As they regrouped to figure out what their third album might be…
# 4 4 It’s still funny to think, two decades later, that Thom Yorke’s…
# 5 5 It’s unclear what happened to that album. OK Computer obviously…
# 6 6 OKNOTOK is something a little more interesting than a remaster …
# 7 7 But “Lift’s” reputation for positivity might be a little confus…
# 8 8 The most fun to be had with OKNOTOK is in these line-blurring m…
# 9 9 This fondness for camp and schlock has always been latent in Ra…
#10 10 The ghost of Bond followed them once they decamped from their s…
#11 11 Radiohead have been at least as brilliant at packaging and posi…
#12 12 Now that they have arrived at an autumnal, valedictory stage in…
#
#$score
#[1] 10https://stackoverflow.com/questions/60518679
复制相似问题