我有一个我写的函数,我叫tidy_stat_tbl()
它从我的包TidyDensity中获取一个data.framework/ti球,并返回用户选择他们选择的向量的统计信息。
我现在正在1k的复制上运行rbenchmark,我怀疑这需要2.5-3个小时,这很好,我明天早上就会看到结果。
为了加速比较,我运行了以下命令:
library(TidyDensity)
library(tidyverse)
library(tictoc)
x <- mtcars$mpg
te <- tidy_bootstrap(x) %>%
bootstrap_unnest_tbl()
tic()
s <- tidy_stat_tbl(.data = te, .x = y, .fns = IQR, .return_type = "vector")
toc()
0.58 sec elapsed
tic()
l <- tidy_stat_tbl(.data = te, .x = y, .fns = IQR, .return_type = "list")
toc()
0.56 sec elapsed
tic()
t <- tidy_stat_tbl(.data = te, .x = y, .fns = IQR, .return_type = "tibble")
toc()
6.36 sec elapsed当.return_type设置为"vector"时,运行sapply,当设置为"list" lapply时运行。
需要很长时间才能完成的代码(我希望加快)如下:
if (return_type == "tibble") {
ret <- purrr::map(
df_tbl, ~ func(.x) %>%
purrr::imap(.f = ~ cbind(.x, name = .y)) %>%
purrr::map_df(dplyr::as_tibble) %>%
dplyr::select(2, 1)
) %>%
purrr::imap(.f = ~ cbind(.x, sim_number = .y)) %>%
purrr::map_df(dplyr::as_tibble) %>%
dplyr::select(sim_number, name, .x) %>%
dplyr::mutate(.x = as.numeric(.x)) %>%
dplyr::mutate(sim_number = factor(sim_number)) %>%
dplyr::rename(value = .x)
cn <- c("sim_number","name",func_chr)
names(ret) <- cn
}te结构
str(te)
tibble [50,000 × 2] (S3: tbl_df/tbl/data.frame)
$ sim_number: Factor w/ 2000 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ y : num [1:50000] 17.8 32.4 17.8 19.2 15.8 15.5 15 21.4 10.4 16.4 ...
- attr(*, ".num_sims")= num 2000
- attr(*, "distribution_family_type")= chr "continuous"
- attr(*, "tibble_type")= chr "tidy_bootstrap"
- attr(*, "dist_with_params")= chr "Empirical"如果可能的话,我很想加快purrr的速度,只是不知道从哪里开始,或者如何真正开始。
下面是函数的一些示例输出:
head(s, 3)
sim_number_1 sim_number_2 sim_number_3
5.0 7.8 7.6
head(l, 3)
$sim_number_1
[1] 5
$sim_number_2
[1] 7.8
$sim_number_3
[1] 7.6
head(t, 3)
# A tibble: 3 × 3
sim_number name IQR
<fct> <dbl> <dbl>
1 1 1 5
2 2 1 7.8
3 3 1 7.6发布于 2022-09-30 16:30:45
这个代码帮助提高了50%的速度。
if (return_type == "tibble") {
ret <- purrr::map(
df_tbl, ~ func(.x) %>%
purrr::imap(.f = ~ cbind(.x, name = .y)) %>%
purrr::map_df(dplyr::as_tibble)
#dplyr::select(2, 1)
) %>%
purrr::imap(.f = ~ cbind(.x, sim_number = .y)) %>%
purrr::map_df(dplyr::as_tibble) %>%
dplyr::select(sim_number, name, .x) %>%
dplyr::mutate(.x = as.numeric(.x)) %>%
dplyr::mutate(sim_number = factor(sim_number)) %>%
dplyr::rename(value = .x)
cn <- c("sim_number","name",func_chr)
names(ret) <- cn
}https://stackoverflow.com/questions/73900709
复制相似问题