问Rvest: xlsx下载
EN

Stack Overflow用户

提问于 2022-08-04 18:24:25

回答 1查看 24关注 0票数 0

我正在尝试下载xlsx文件，代码如下：

library(rvest)

file <- "tesouro.csv"

site <- read_html("https://www.tesourotransparente.gov.br/publicacoes/boletim-resultado-do-tesouro-nacional-rtn/")

link <- site %>% html_nodes(xpath="//a[contains(text(), 'serie_historica_jun22.xlsx')]") %>% html_attr("href")

download.file(
  url = link,
  mode = "w", destfile = file
)

但是下载是一个空的xlsx，电子表格中包含html代码：

    <html>
    <head>
        <script>
        var isMobile = /iPhone|iPad|iPod|Android/i.test(navigator.userAgent)
        if(isMobile) {
            window.location = "https://cdn.tesouro.gov.br/sistemas-internos///apex//producao//sistemas//thot//arquivos//publicacoes/44179_1398982/anexos/16848_586003///serie_historica_jun22.xlsx?v=7366"
        }
    </script>
    </head>
    <frameset COLS="*" border=0>
        <frame SRC="https://cdn.tesouro.gov.br/sistemas-internos///apex//producao//sistemas//thot//arquivos//publicacoes/44179_1398982/anexos/16848_586003///serie_historica_jun22.xlsx?v=7366" frameborder=0>
    </frameset>
    </html>```

rvest

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-08-04 18:35:38

你需要遵循框架中的链接。而且，它是一个xlsx，所以您需要将它保存为一个，而不是一个csv：

library(rvest)

site <- read_html(paste0("https://www.tesourotransparente.gov.br/publicacoes/",
                         "boletim-resultado-do-tesouro-nacional-rtn/"))

site %>% 
  html_nodes(xpath="//a[contains(text(), 'serie_historica_jun22.xlsx')]") %>% 
  html_attr("href") %>%
  read_html() %>%
  html_nodes(xpath = "//frameset/frame") %>%
  html_attr("src") %>%
  httr::GET() %>%
  httr::content("raw") %>%
  writeBin("tesouro.xlsx")

现在我们有了

tesouro.xlsx

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/73240692

复制

相似问题

问Rvest: xlsx下载
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Rvest: xlsx下载EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Rvest: xlsx下载
EN