我读到过,用循环“构建”一个数据帧比初始化一个大尺寸的数据帧,填充,然后缩减要慢得多。我想知道是否有一种更有效的方法来实现这一点,要么初始化数据帧,要么使用应用类型功能。
我有一个由500个股票符号组成的向量,我希望为其返回3列信息:“交易日期”、“报价器”和“exerniv30d”。我想在一个日期范围内(从startDate到endDate)执行此操作。在循环中,我只是简单地行绑定并转到下一个日期。
代码相当慢(它最终是大约151,767行的tibble ),我不确定这是因为我调用数据的方式,还是因为循环的结构(当我试图一次获取所有500个符号的数据时,我遇到了大小错误,所以我不得不将这500个符号分成两个单独的请求)。
library(dplyr)
library(tibble)
library(zoo)
library(Quandl)
# Getting vector of tickers --------------------------------------------------------
url <- "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
SP500 <- url %>%
xml2::read_html() %>%
html_nodes(xpath='//*[@id="mw-content-text"]/div/table[1]') %>%
html_table()
SP500 <- SP500[[1]]
#cleaning vector to remove/replace troublesome values
SP500 <- SP500 %>% arrange(Symbol) %>%
filter(Symbol != "CBOE")
SP500 <- SP500$Symbol
SP500 <- replace(SP500,match("BF.B",SP500),"BF_B")
SP500 <- replace(SP500,match("BRK.B",SP500),"BRK_B")
tickers <- c(SP500,"SPY")
# Initializing variables for loop-------------------------------------------------
startDate = as.Date("2020-01-02")
endDate = as.Date("2021-03-16")
cols_to_fetch <- c("tradedate","ticker","exerniv30d")
holderframe <- tibble()
while(startDate < endDate) {
d1<-Quandl.datatable('ORATS/VOL', ticker = tickers[1:300], tradedate=startDate, qopts.columns=cols_to_fetch,
paginate = TRUE)
d2<-Quandl.datatable('ORATS/VOL', ticker = tickers[301:length(tickers)], tradedate=startDate, qopts.columns=cols_to_fetch,
paginate = TRUE)
d <- bind_rows(d2,d1)
holderframe <- bind_rows(holderframe,d)
startDate <- startDate + 1
}发布于 2021-03-30 05:56:41
如果我是你,我会把这个并行化:
library(dplyr)
library(tibble)
library(zoo)
library(Quandl)
library(rvest)
library(furrr)
# Getting vector of tickers --------------------------------------------------------
url <- "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
SP500 <- url %>%
xml2::read_html() %>%
html_nodes(xpath='//*[@id="mw-content-text"]/div/table[1]') %>%
html_table()
SP500 <- SP500[[1]]
#cleaning vector to remove/replace troublesome values
SP500 <- SP500 %>% arrange(Symbol) %>%
filter(Symbol != "CBOE")
SP500 <- SP500$Symbol
SP500 <- replace(SP500,match("BF.B",SP500),"BF_B")
SP500 <- replace(SP500,match("BRK.B",SP500),"BRK_B")
tickers <- c(SP500,"SPY")
# Initializing variables for loop-------------------------------------------------
startDate = as.Date("2020-01-02")
endDate = as.Date("2021-03-16")
start_dates <- seq(startDate, endDate, by = 'day')
cols_to_fetch <- c("tradedate","ticker","exerniv30d")
N_CORES <- number_of_cores_you_want_to_use
plan(multisession, workers = N_CORES)
get_data <- function(startDate) {
d1 <- Quandl.datatable('ORATS/VOL', ticker = tickers[1:300], tradedate=startDate, qopts.columns=cols_to_fetch,
paginate = TRUE)
d2 <- Quandl.datatable('ORATS/VOL', ticker = tickers[301:length(tickers)], tradedate=startDate, qopts.columns=cols_to_fetch,
paginate = TRUE)
bind_rows(d1, d2)
}
## map over the function in parallel
df <- start_dates %>%
future_map_dfr(
get_data
)https://stackoverflow.com/questions/66861464
复制相似问题