如果我以mtcar为例:
mtcars <- subset(mtcars, select = c("cyl", "disp"))我如何添加两个额外的列,一个表示值低于/高于中位数,另一个表示值在哪个四分位数中?但是,我希望每组cyl都这样做。
这是我希望的具体结果:
cyl disp median_split quartile_split
Toyota Corolla 4 71.1 below_median 1st_quartile
Honda Civic 4 75.7 below_median 1st_quartile
Fiat 128 4 78.7 below_median 1st_quartile
Fiat X1-9 4 79 below_median 2nd_quartile
Lotus Europa 4 95.1 below_median 2nd_quartile
Datsun 710 4 108 median median
Toyota Corona 4 120.1 above_median 3rd_quartile
Porsche 914-2 4 120.3 above_median 3rd_quartile
Volvo 142E 4 121 above_median 4th_quartile
Merc 230 4 140.8 above_median 4th_quartile
Merc 240D 4 146.7 above_median 4th_quartile
Ferrari Dino 6 145 below_median 1st_quartile
Mazda RX4 6 160 etc… etc…我会很感激你的帮助。谢谢。
编辑继阿肯在下面的回答
在quartile_split列中,akun的答案在NA的每个cyl组中留下了最低值。我想我可以通过添加:
mtcars$quartile_split[is.na(mtcars$quartile_split)] <- "1_quartile" #not a very elegant solution所以完整的代码是:
library(dplyr)
mtcars <- subset(mtcars, select = c("cyl", "disp"))
# akrun's answer
mtcars <- mtcars %>%
group_by(cyl) %>%
mutate(median_split = c("above_median", "below_median")[1 +
(disp <= median(disp))],
quartile_split = cut(disp, breaks = quantile(disp),
labels = paste0(1:4, "_quartile")))
# addition
mtcars$quartile_split[is.na(mtcars$quartile_split)] <- "1_quartile" #not a very elegant solution但是,当我更仔细地查看时,我还发现了一些似乎不太正确的东西,特别是当您只查看cyl = 6组时,您会看到以下内容:
cyl disp median_split quartile_split
6 145 below_median 1_quartile
6 160 below_median 1_quartile
6 160 below_median 1_quartile
6 167.6 below_median 2_quartile
6 167.6 below_median 2_quartile
6 225 above_median 4_quartile
6 258 above_median 4_quartile这一组的disp中位数为163.8,因此disp = 167.6的两辆车应该被归类为"above_median",而不是"below_median“。
我希望这件事能得到解决。再次感谢您。
发布于 2019-07-24 21:24:34
一个选项是按'cyl‘分组,使用cut根据'disp’列上的quantile创建不同的类别
library(dplyr)
mtcars %>%
group_by(cyl) %>%
mutate(median_split = c("above_median", "below_median")[1 +
(disp <= median(disp))],
quartile_split = cut(disp, breaks = quantile(disp),
labels = paste0(1:4, "_quartile")))发布于 2019-07-24 21:47:22
带基R和cut
mtcars <- subset(mtcars, select = c("cyl", "disp"))
mtcars$median_split <- ifelse(mtcars$disp <= median(mtcars$disp), "below_median","above_median")
mtcars$quantile_split <- cut(mtcars$disp, breaks = c(0, quantile(mtcars$disp)),labels = c("1_quartile",paste0(1:4, "_quartile")))当使用cut函数确保中断包括最小值(否则它将返回NA)和最小值时,请小心。标记在第一个四分位数中。
https://stackoverflow.com/questions/57191413
复制相似问题