你好,我可以看到有很多人,你已经成功地使用了下面的代码来抓取tripadvisor,但是它在我的例子中不起作用
library(rvest)
library(xml2)
library(dplyr)
url <- "http://www.tripadvisor.com/ShowUserReviews-g189400-d206779-r838449448-Royal_Olympic-Athens_Attica.html"
reviews <- url %>%
read_html() %>%
html_elements("#REVIEWS .innerBubble")
id <- reviews %>%
html_element(".quote a") %>%
html_attr("id")
quote <- reviews %>%
html_element(".quote span") %>%
html_text()
rating <- reviews %>%
html_element(".rating .rating_s_fill") %>%
html_attr("alt") %>%
gsub(" of 5 stars", "", .) %>%
as.integer()
date <- reviews %>%
html_element(".rating .ratingDate") %>%
html_attr("title") %>%
strptime("%b %d, %Y") %>%
as.POSIXct()
review <- reviews %>%
html_element(".entry .partial_entry") %>%
html_text()
data.frame(id, quote, rating, date, review, stringsAsFactors = FALSE) %>% View()你知道为什么不起作用吗?
发布于 2022-05-16 13:30:38
经过多次尝试,对我起作用的是下面这一条。我还没有完成其余的信息。我会随时通知你的。非常感谢@danlooo
reviews3 <- url %>%
read_html("home/tripad/file_saved_using_firefox.html")
reviews4 <- reviews3 %>%
html_elements("#REVIEWS .innerBubble")
review <- reviews3 %>%
html_elements(".entry .partial_entry") %>%
html_text()
data.frame( review, stringsAsFactors = FALSE) %>% View()```https://stackoverflow.com/questions/72257424
复制相似问题