我想从这个网页中提取数据,http://old.emmsa.com.pe/emmsa_spv/rpEstadistica/rptVolPreciosDiarios.php,它使用java脚本,目前我还无法找到一种方法来提取数据的数量和价格的每日频率。


我尝试了许多在这一页上提出的备选方案,但没有一个方案对我有效,因为这是一个分两个步骤获得的表格。
我试图修改这里出现的代码,https://www.r-bloggers.com/2020/04/an-adventure-in-downloading-books/,但我无法下载数据。
我的版本是:
library(Rcrawler)
install_browser() # One time only
br <- run_browser()
page<-LinkExtractor(url="http://old.emmsa.com.pe/emmsa_spv/rpEstadistica/rptVolPreciosDiarios.php",
Browser = br, ExternalLInks = TRUE)
el <- page$InternalLinks
sprlnks <- el[grep("emmsa", el, fixed = TRUE)]
for (sprlnk in sprlnks) {
spr_page <- LinkExtractor(sprlnk)
il <- spr_page$InternalLinks
ttl <- spr_page$Info$Title
ttl <- trimws(strsplit(ttl, "|", fixed = TRUE)[[1]][1])
chapter_link <- il[grep("chapter", il, fixed = TRUE)][1]
chp_splits <- strsplit(chapter_link, "/", fixed = TRUE)
n <- length(chp_splits[[1]])
suff <- chp_splits[[1]][n]
suff <- gsub(".{2}$", "", suff)
pref <- chp_splits[[1]][n-1]
final_url <- paste0("http://old.emmsa.com.pe/emmsa_spv/rpEstadistica/rptVolPreciosDiarios.php", pref, "/",
suff, ".php")
print(final_url)
download.file(final_url, paste0(ttl, ".php"), mode = "wb")
Sys.sleep(5)
}
stop_browser(br)我得到一个文件“Empresa市政de Mercados S.A.php”,该文件经常重复,其中第294行出现
最后,我想要的是,您可以帮助我生成一个脚本,允许我从"emmsa“网站下载每日价格和数量数据。
发布于 2022-06-16 03:14:18
您可以像页面那样执行POST请求,并从响应中解析出表。
library(httr)
library(rvest)
library(janitor)
library(dplyr)
headers <- c("Content-Type" = "application/x-www-form-urlencoded; charset=UTF-8")
data <- "vid_tipo=1&vprod=&vvari=&vfecha=15/06/2022"
r <- httr::POST(
url = "http://old.emmsa.com.pe/emmsa_spv/app/reportes/ajax/rpt07_gettable.php",
httr::add_headers(.headers = headers),
body = data
)
t <- content(r) %>%
html_element(".timecard") %>%
html_table() %>%
row_to_names(1) %>%
clean_names() %>%
dplyr::filter(producto != "") %>%
mutate_at(vars(matches("precio")), as.numeric)卷选项(不同的html)
library(httr)
library(rvest)
library(janitor)
library(dplyr)
headers <- c("Content-Type" = "application/x-www-form-urlencoded; charset=UTF-8")
data <- "vid_tipo=2&vprod=&vvari=&vfecha=17/06/2022"
r <- httr::POST(
url = "http://old.emmsa.com.pe/emmsa_spv/app/reportes/ajax/rpt07_gettable.php",
httr::add_headers(.headers = headers),
body = data
)
t <- content(r) %>%
html_element("#tbReport") %>%
html_table() %>%
clean_names() https://stackoverflow.com/questions/72637767
复制相似问题