我有以下数据集,它已经按事务排序:
dataset <- data.frame(id = c(1,2,3,4,2,4,6,7,3,2),
transaction = c(1,2,3,4,5,6,7,8,9,10),
amount = c(200,100,50,100,50,300,100,50,100,50))如您所见,每个客户都有一个Id和在交易中花费的金额。
我的问题是,如何识别客户是交易中的新客户,还是反复出现的客户。新客户意味着这是它的第一笔交易,下一笔交易是重复的。
recurrence_status <- c("new","new","new","new","recurrent",
"recurrent","new","new","recurrent","recurrent")到目前为止,我已经尝试了以下几种方法:
for (i in 1:(length(dataset$transaction)-1)){
for(j in 2:length(dataset$transaction)){
j <- j + 1
comp <- dataset[j:length(dataset$id)]
ifelse((is.element(dataset[i,1]),comp),"recurrent","new")
}
}但是由于括号的原因,它给了我一个错误。我知道应该尽可能避免在R中使用循环。请不要客气,我们欢迎您的帮助。
致以敬意,
发布于 2020-02-12 03:01:49
在base R中,这可以通过duplicated来完成
dataset$recurrence_status <- c("new", "recurrent")[duplicated(dataset$id) + 1]
dataset$recurrence_status
#[1] "new" "new" "new" "new" "recurrent" "recurrent" "new" "new" "recurrent"
#[10] "recurrent"发布于 2020-02-12 03:10:19
利用dplyr
dataset %>%
group_by(id) %>%
mutate(recurrence_status = factor(+(row_number() > 1),
levels = c(0, 1),
labels = c("new", "recurrent")))
id transaction amount recurrence_status
<dbl> <dbl> <dbl> <fct>
1 1 1 200 new
2 2 2 100 new
3 3 3 50 new
4 4 4 100 new
5 2 5 50 recurrent
6 4 6 300 recurrent
7 6 7 100 new
8 7 8 50 new
9 3 9 100 recurrent
10 2 10 50 recurrent https://stackoverflow.com/questions/60175703
复制相似问题