我有一个动物数据库,这些动物已经被测试过。动物成群结队。每个牛群都可以进行多次测试。我想做一个新的专栏,来告诉我这群动物是第一次还是第二次接受测试。
这是我的数据库的一个例子:
df <- data.frame(
animal = c("Animal1", "Animal2", "Animal3", "Animal1", "Animal2", "Animal3", "Animal4", "Animal5", "Animal6", "Animal4", "Animal5", "Animal6"),
herd = c("Herd1","Herd1","Herd1", "Herd1","Herd1","Herd1","Herd2","Herd2", "Herd2","Herd2","Herd2","Herd2"),
date = c("2017-01-01", "2017-01-01", "2017-01-01", "2018-07-01" , "2018-07-01", "2018-07-01", "2017-05-01", "2017-05-01", "2017-05-01", "2019-07-01", "2019-07-01", "2019-07-01"))所以,我希望它看起来像这样
animal herd date testing
1 Animal1 Herd1 2017-01-01 1
2 Animal2 Herd1 2017-01-01 1
3 Animal3 Herd1 2017-01-01 1
4 Animal1 Herd1 2018-07-01 2
5 Animal2 Herd1 2018-07-01 2
6 Animal3 Herd1 2018-07-01 2
7 Animal4 Herd2 2017-05-01 1
8 Animal5 Herd2 2017-05-01 1
9 Animal6 Herd2 2017-05-01 1
10 Animal4 Herd2 2019-07-01 2
11 Animal5 Herd2 2019-07-01 2
12 Animal6 Herd2 2019-07-01 2我已经尝试过了,但并不完全是我想要的,整个数据库变得非常混乱。
df <- df %>%
group_by(herd) %>%
mutate(testing = rank(date))
> df
# A tibble: 12 x 4
# Groups: herd [2]
animal herd date testing
<fct> <fct> <fct> <dbl>
1 Animal1 Herd1 2017-01-01 2
2 Animal2 Herd1 2017-01-01 2
3 Animal3 Herd1 2017-01-01 2
4 Animal1 Herd1 2018-07-01 5
5 Animal2 Herd1 2018-07-01 5
6 Animal3 Herd1 2018-07-01 5
7 Animal4 Herd2 2017-05-01 2
8 Animal5 Herd2 2017-05-01 2
9 Animal6 Herd2 2017-05-01 2
10 Animal4 Herd2 2019-07-01 5
11 Animal5 Herd2 2019-07-01 5
12 Animal6 Herd2 2019-07-01 5谢谢你的帮忙!
发布于 2022-09-19 10:08:20
您可以使用dplyr::dense_rank
df %>%
group_by(herd) %>%
mutate(testing = dense_rank(date))输出
animal herd date testing
<chr> <chr> <chr> <int>
1 Animal1 Herd1 2017-01-01 1
2 Animal2 Herd1 2017-01-01 1
3 Animal3 Herd1 2017-01-01 1
4 Animal1 Herd1 2018-07-01 2
5 Animal2 Herd1 2018-07-01 2
6 Animal3 Herd1 2018-07-01 2
7 Animal4 Herd2 2017-05-01 1
8 Animal5 Herd2 2017-05-01 1
9 Animal6 Herd2 2017-05-01 1
10 Animal4 Herd2 2019-07-01 2
11 Animal5 Herd2 2019-07-01 2
12 Animal6 Herd2 2019-07-01 2发布于 2022-09-19 10:52:02
在seq中使用ave的单线行。
transform(df, testing=ave(date, herd, animal, FUN=seq.int))
# animal herd date testing
# 1 Animal1 Herd1 2017-01-01 1
# 2 Animal2 Herd1 2017-01-01 1
# 3 Animal3 Herd1 2017-01-01 1
# 4 Animal1 Herd1 2018-07-01 2
# 5 Animal2 Herd1 2018-07-01 2
# 6 Animal3 Herd1 2018-07-01 2
# 7 Animal4 Herd2 2017-05-01 1
# 8 Animal5 Herd2 2017-05-01 1
# 9 Animal6 Herd2 2017-05-01 1
# 10 Animal4 Herd2 2019-07-01 2
# 11 Animal5 Herd2 2019-07-01 2
# 12 Animal6 Herd2 2019-07-01 2发布于 2022-09-19 10:53:32
您可以像这样使用data.table:
library("data.table")
setDT(df)
df[,testing := .N, by = list(animal,herd)][,testing := seq_along(testing), by = list(animal,herd)]或者以另一种方式:
library("data.table")
setDT(df)
df[,testing := NA][,testing := seq_along(testing), by = list(animal,herd)]在这两种情况下,您都使用data.table包、其dt[i,j,by]语法和链接用法dt[first][second]。
https://stackoverflow.com/questions/73771739
复制相似问题