我正在尝试创建一个汇总了几个向量的函数,提示是
Write a function data_summary which takes three inputs:\
`dataset`: A data frame\
`vars`: A character vector whose elements are names of columns from dataset which the user wants summaries for\
`group.name`: A length one character vector which gives the name of the column from dataset which contains the factor which will be used as a grouping variable
\`var.names`: A character vector of the same length as vars which gives the names that the user would like used as the entries under “Variable” in the resulting output. This should be set equal to vars by default, so the default behavior is to use the column names from dataset.
The output of the function should be a data frame with the following structure:
Column names of the data frame will be:\
`Variable`\
`Missing`\
The `first` level of the factor group.name\
The `second` level of the factor group.name\
…\
The `kth` level of the factor group.name\
`p-value`我已经设置好代码了,
data_summary <- function(dataset,vars,group.name,var.names) {
}有一个例子说明了
#data_summary<-function(dataset, vars,group.name, var.name){}
#example
#data_summary(titanic4, c("survived", "female", "age", "sibsp", "parch", "fare", "cabin"), "pclass")
#data_summary(titanic4, c("survived", "female", "age", "sibsp", "parch", "fare", "cabin"), "pclass", c("Survival rate", "% Female", "Age", "# siblings/spouses aboard", "# children/parents aboard", "Fare ($)", "Cabin"))但是除了为函数输入参数之外,它真的对我没有什么帮助。
发布于 2019-11-01 03:31:35
您可以使用dplyr包来实现此功能。我也不知道你想通过哪些函数来总结你的数据帧,所以我使用了summary函数从基础包返回的所有函数。
我的数据:
> NewSKUMatrix
# A tibble: 268,918 x 4
LagerID FilialID CSBID Price
<int> <int> <int> <dbl>
1 233 2578 1005 38.3
2 333 2543 NA 61.0
3 334 2543 NA 15.0
4 335 2543 NA 11.0
5 337 2301 NA 71.0
6 338 2031 NA 37.0
7 338 2044 NA 35.0
8 338 2054 NA 36.0
9 338 2060 NA 37.0
10 338 2063 NA 36.0
# ... with 268,908 more rows功能:
data_summary <- function(data,
variables,
values,
names = NULL) {
if (is.null(x = names)) {
names <- variables
}
data %>%
group_by_at(.vars = variables) %>%
summarise_at(
.vars = values,
.funs = list(
Min. = min,
`1st Qu.` = ~ quantile(x = ., probs = 0.25),
Median = median,
Mean = mean,
`3rd Qu.` = ~ quantile(x = ., probs = 0.75),
Max. = max
)
) %>%
rename_at(.vars = variables,
.funs = ~ names)
}输出:
data_summary(NewSKUMatrix,
c('LagerID'),
c('Price'),
c('SKU'))
# A tibble: 32,454 x 7
SKU Min. `1st Qu.` Median Mean `3rd Qu.` Max.
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 17 39.0 39.0 39.0 39.0 39.0 39.0
2 18 120. 120. 120. 121. 120. 140.
3 21 289. 289. 289. 289. 289. 289.
4 24 37.0 37.0 37.0 45.2 45.2 70.0
5 25 14.0 14.0 14.0 14.0 14.0 14.0
6 55 30.9 30.9 30.9 30.9 30.9 30.9
7 117 26.9 26.9 26.9 26.9 26.9 26.9
8 118 24.8 24.9 24.9 25.1 25.1 25.7
9 119 24.8 24.8 24.9 25.1 25.3 25.7
10 158 104. 108. 108. 107. 108. 108.
# ... with 32,444 more rowshttps://stackoverflow.com/questions/58648351
复制相似问题