我正在尝试下载xlsx文件,代码如下:
library(rvest)
file <- "tesouro.csv"
site <- read_html("https://www.tesourotransparente.gov.br/publicacoes/boletim-resultado-do-tesouro-nacional-rtn/")
link <- site %>% html_nodes(xpath="//a[contains(text(), 'serie_historica_jun22.xlsx')]") %>% html_attr("href")
download.file(
url = link,
mode = "w", destfile = file
)但是下载是一个空的xlsx,电子表格中包含html代码:
<html>
<head>
<script>
var isMobile = /iPhone|iPad|iPod|Android/i.test(navigator.userAgent)
if(isMobile) {
window.location = "https://cdn.tesouro.gov.br/sistemas-internos///apex//producao//sistemas//thot//arquivos//publicacoes/44179_1398982/anexos/16848_586003///serie_historica_jun22.xlsx?v=7366"
}
</script>
</head>
<frameset COLS="*" border=0>
<frame SRC="https://cdn.tesouro.gov.br/sistemas-internos///apex//producao//sistemas//thot//arquivos//publicacoes/44179_1398982/anexos/16848_586003///serie_historica_jun22.xlsx?v=7366" frameborder=0>
</frameset>
</html>```发布于 2022-08-04 18:35:38
你需要遵循框架中的链接。而且,它是一个xlsx,所以您需要将它保存为一个,而不是一个csv:
library(rvest)
site <- read_html(paste0("https://www.tesourotransparente.gov.br/publicacoes/",
"boletim-resultado-do-tesouro-nacional-rtn/"))
site %>%
html_nodes(xpath="//a[contains(text(), 'serie_historica_jun22.xlsx')]") %>%
html_attr("href") %>%
read_html() %>%
html_nodes(xpath = "//frameset/frame") %>%
html_attr("src") %>%
httr::GET() %>%
httr::content("raw") %>%
writeBin("tesouro.xlsx")现在我们有了
tesouro.xlsx

https://stackoverflow.com/questions/73240692
复制相似问题