首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >抓取Zillow的多个页面

抓取Zillow的多个页面
EN

Stack Overflow用户
提问于 2021-11-16 04:59:21
回答 1查看 74关注 0票数 0

我试图使用下面包含的网站在Zillow的两个页面之间抓取大约54个“代理列表”和11个“其他列表”,但我的代码在搜索结果的第一页上只产生了“代理列表”的前20个结果。如何修改我的代码以获取所有页面上“代理列表”和“其他列表”的所有结果?

代码语言:javascript
复制
res_all <-NULL

for (page_result in 1:40) {
  zillow_url = paste0("https://www.zillow.com/providence-ri/duplex/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22Providence%2C%20RI%22%2C%22mapBounds%22%3A%7B%22west%22%3A-71.48892251635742%2C%22east%22%3A-71.36017648364258%2C%22south%22%3A41.77131876826507%2C%22north%22%3A41.862664689400106%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A26637%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22globalrelevanceex%22%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%2C%22sf%22%3A%7B%22value%22%3Afalse%7D%2C%22tow%22%3A%7B%22value%22%3Afalse%7D%2C%22con%22%3A%7B%22value%22%3Afalse%7D%2C%22apco%22%3A%7B%22value%22%3Afalse%7D%2C%22land%22%3A%7B%22value%22%3Afalse%7D%2C%22apa%22%3A%7B%22value%22%3Afalse%7D%2C%22manu%22%3A%7B%22value%22%3Afalse%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A13%7D")

zpg = read_html(zillow_url)

zillow_pg <-tibble(
  addr = zpg %>% html_nodes(".list-card-addr") %>% html_text(),
  price = zpg %>% html_nodes(".list-card-price") %>% html_text(),
  details = zpg %>% html_nodes(".list-card-details") %>% html_text() ,
  heading= zpg %>% html_nodes(".list-card-info a") %>% html_text() ,
  type = zpg %>% html_nodes(".list-card-statusText") %>% html_text())


res_all <- distinct(bind_rows(res_all, zillow_pg))
}
EN

回答 1

Stack Overflow用户

发布于 2021-11-22 17:23:21

您需要RSelenium,因为页面是动态加载的。

以下是提取价格的部分答案。

启动浏览器

代码语言:javascript
复制
library(rvest)
library(dplyr)
library(RSelenium)
driver = rsDriver(browser = c("firefox"))
remDr <- driver[["client"]]
remDr$navigate(url)

现在加载所有列表

代码语言:javascript
复制
remDr$findElement(using = 'xpath', value = '//*[@id="grid-search-results"]/div[1]/h1')$clickElement()
webElem <- remDr$findElement("css", "body")
#scrolling to the end of webpage. 
webElem$sendKeysToElement(list(key = "end"))
webElem$sendKeysToElement(list(key = "home"))

如果你不能得到所有物品的价格,重复最后两步。

代码语言:javascript
复制
remDr$getPageSource()[[1]] %>% 
  read_html()   %>% 
  html_nodes(".list-card-price") %>% html_text()
 [1] "$399,999"   "$449,900"   "$399,000"   "$469,900"   "$310,000"   "$319,900"   "$404,900"   "$320,000"   "$529,000"   "$750,000"   "$335,000"   "$299,000"  
[13] "$349,900"   "$314,900"   "$369,999"   "$359,000"   "$149,900"   "$309,900"   "$377,000"   "$360,000"   "$699,900"   "$410,000"   "$634,900"   "$310,000"  
[25] "$695,000"   "$395,000"   "$339,900"   "$399,900"   "$350,000"   "$369,900"   "$639,000"   "$3,995,000" "$799,000"   "$699,000"   "$349,000"   "$448,000" 

现在转到第2页,获取剩余的列表

代码语言:javascript
复制
remDr$findElement(using = 'xpath', value = '//*[@id="grid-search-results"]/div[3]/nav/ul/li[5]/a')$clickElement()
remDr$getPageSource()[[1]] %>% 
  read_html()   %>% 
  html_nodes(".list-card-price") %>% html_text()
 [1] "$575,000"   "$299,000"   "$369,900"   "$345,500"   "$799,000"   "$380,000"   "$300,000"   "$1,295,000" "$575,000"   "$575,000"   "$599,900"   "$799,000"  
[13] "$474,900"   "$399,900" 

现在从其他列表部分获取价格。

代码语言:javascript
复制
remDr$findElement(using = 'xpath', value = '//*[@id="grid-search-results"]/div[1]/div/div[1]/div/button[2]')$clickElement()
remDr$getPageSource()[[1]] %>% 
  read_html()   %>% 
  html_nodes(".list-card-price") %>% html_text()
 [1] "$315,000"      "$350,000"      "$439,000"      "$350,000"      "$395,000"      "$315,000"      "Est. $396,600" "Est. $681,300" "$234,000"     
[10] "$449,900"      "$249,900"      "Est. $310,300"
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/69983956

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档