我试图使用下面包含的网站在Zillow的两个页面之间抓取大约54个“代理列表”和11个“其他列表”,但我的代码在搜索结果的第一页上只产生了“代理列表”的前20个结果。如何修改我的代码以获取所有页面上“代理列表”和“其他列表”的所有结果?
res_all <-NULL
for (page_result in 1:40) {
zillow_url = paste0("https://www.zillow.com/providence-ri/duplex/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22Providence%2C%20RI%22%2C%22mapBounds%22%3A%7B%22west%22%3A-71.48892251635742%2C%22east%22%3A-71.36017648364258%2C%22south%22%3A41.77131876826507%2C%22north%22%3A41.862664689400106%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A26637%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22globalrelevanceex%22%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%2C%22sf%22%3A%7B%22value%22%3Afalse%7D%2C%22tow%22%3A%7B%22value%22%3Afalse%7D%2C%22con%22%3A%7B%22value%22%3Afalse%7D%2C%22apco%22%3A%7B%22value%22%3Afalse%7D%2C%22land%22%3A%7B%22value%22%3Afalse%7D%2C%22apa%22%3A%7B%22value%22%3Afalse%7D%2C%22manu%22%3A%7B%22value%22%3Afalse%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A13%7D")
zpg = read_html(zillow_url)
zillow_pg <-tibble(
addr = zpg %>% html_nodes(".list-card-addr") %>% html_text(),
price = zpg %>% html_nodes(".list-card-price") %>% html_text(),
details = zpg %>% html_nodes(".list-card-details") %>% html_text() ,
heading= zpg %>% html_nodes(".list-card-info a") %>% html_text() ,
type = zpg %>% html_nodes(".list-card-statusText") %>% html_text())
res_all <- distinct(bind_rows(res_all, zillow_pg))
}发布于 2021-11-22 17:23:21
您需要RSelenium,因为页面是动态加载的。
以下是提取价格的部分答案。
启动浏览器
library(rvest)
library(dplyr)
library(RSelenium)
driver = rsDriver(browser = c("firefox"))
remDr <- driver[["client"]]
remDr$navigate(url)现在加载所有列表
remDr$findElement(using = 'xpath', value = '//*[@id="grid-search-results"]/div[1]/h1')$clickElement()
webElem <- remDr$findElement("css", "body")
#scrolling to the end of webpage.
webElem$sendKeysToElement(list(key = "end"))
webElem$sendKeysToElement(list(key = "home"))如果你不能得到所有物品的价格,重复最后两步。
remDr$getPageSource()[[1]] %>%
read_html() %>%
html_nodes(".list-card-price") %>% html_text()
[1] "$399,999" "$449,900" "$399,000" "$469,900" "$310,000" "$319,900" "$404,900" "$320,000" "$529,000" "$750,000" "$335,000" "$299,000"
[13] "$349,900" "$314,900" "$369,999" "$359,000" "$149,900" "$309,900" "$377,000" "$360,000" "$699,900" "$410,000" "$634,900" "$310,000"
[25] "$695,000" "$395,000" "$339,900" "$399,900" "$350,000" "$369,900" "$639,000" "$3,995,000" "$799,000" "$699,000" "$349,000" "$448,000" 现在转到第2页,获取剩余的列表
remDr$findElement(using = 'xpath', value = '//*[@id="grid-search-results"]/div[3]/nav/ul/li[5]/a')$clickElement()
remDr$getPageSource()[[1]] %>%
read_html() %>%
html_nodes(".list-card-price") %>% html_text()
[1] "$575,000" "$299,000" "$369,900" "$345,500" "$799,000" "$380,000" "$300,000" "$1,295,000" "$575,000" "$575,000" "$599,900" "$799,000"
[13] "$474,900" "$399,900" 现在从其他列表部分获取价格。
remDr$findElement(using = 'xpath', value = '//*[@id="grid-search-results"]/div[1]/div/div[1]/div/button[2]')$clickElement()
remDr$getPageSource()[[1]] %>%
read_html() %>%
html_nodes(".list-card-price") %>% html_text()
[1] "$315,000" "$350,000" "$439,000" "$350,000" "$395,000" "$315,000" "Est. $396,600" "Est. $681,300" "$234,000"
[10] "$449,900" "$249,900" "Est. $310,300"https://stackoverflow.com/questions/69983956
复制相似问题