我的df
> df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", "bread", "meat"), sold = rnorm(5, 100))
> df
food sold
1 fruit banana 99.47171
2 fruit apple 99.40878
3 fruit grape 99.28727
4 bread 99.15934
5 meat 100.53438现在,我想用“水果”来代替食物中的所有价值,然后用食物分组,然后用销售金额来概括销售。
> df %>%
+ mutate(food = replace(food, str_detect(food, "fruit"), "fruit")) %>%
+ group_by(food) %>%
+ summarise(sold = sum(sold))
Source: local data frame [3 x 2]
food sold
(fctr) (dbl)
1 bread 99.15934
2 meat 100.53438
3 NA 298.16776为什么这个命令不起作用?它给了我NA而不是水果?
发布于 2017-05-04 09:30:32
这对我有用,我认为你的数据是有因素的:
在按以下方式创建数据时使用stringsAsFactors=FALSE,或者可以在R环境中运行options(stringsAsFactors=FALSE)以避免相同的操作:
df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", "bread", "meat"), sold = rnorm(5, 100),stringsAsFactors = FALSE)
df %>%
mutate(food = replace(food, str_detect(food, "fruit"), "fruit")) %>%
group_by(food) %>%
summarise(sold = sum(sold))输出:
# A tibble: 3 × 2
food sold
<chr> <dbl>
1 bread 99.67661
2 fruit 300.28520
3 meat 99.88566发布于 2017-05-04 09:33:49
我们可以使用base R完成这一任务,而无需转换为character类,方法是将levels与‘levels’分配给‘sum’,并使用aggregate获取sum。
levels(df$food)[grepl("fruit", levels(df$food))] <- "fruit"
aggregate(sold~food, df, sum)
# food sold
#1 bread 99.41637
#2 fruit 300.41033
#3 meat 100.84746数据
set.seed(24)
df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape",
"bread", "meat"), sold = rnorm(5, 100))发布于 2017-05-04 12:45:19
虽然Q标记为dplyr和stringr,但我还是想提出一种使用data.table的替代解决方案,因为data.table以一种方便而直接的方式处理各种因素:
library(data.table)
setDT(df)[food %like% "^fruit", food := "fruit"][, .(sold = sum(sold)), by = food]
# food sold
#1: fruit 300.41033
#2: bread 99.41637
#3: meat 100.84746数据
set.seed(24)
df <- data.frame(food = c("fruit banana", "fruit apple", "fruit grape", "bread", "meat"),
sold = rnorm(5, 100))https://stackoverflow.com/questions/43778696
复制相似问题