我试图创建一个数据集来显示学生如何通过课程。我有这样的数据集:
InvoiceDate StudentName Course
<date> <fct> <fct>
1 2020-07-26 Tom Level 1
2 2020-11-05 Tom Level 2
3 2021-11-05 Tom Level 3
4 2018-10-15 Mary Level 1
5 2020-08-06 Mary Level 2
6 2021-10-10 Mary Level 2我想知道的是,在学生完成一定程度的课程之后,以及学生没有做任何后续课程时,他们会选择哪一门课程。我要创建的数据集如下:
FullName StartCourseDate StartCourse FollowUpCourse FollowUpCourseDate
1 Tom 2020-07-26 Level 1 Level 2 2020-11-05
2 Tom 2020-11-05 Level 2 Level 3 2021-11-05
2 Tom 2021-11-05 Level 3 Stop Stop
3 Mary 2018-10-15 Level 1 Level 2 2020-08-06
4 Mary 2020-08-06 Level 2 Level 2 2021-10-10
4 Mary 2021-10-10 Level 2 Stop Stop我尝试过不同的东西( tidyverse / dplyr ),但不能按正确的顺序排列行。希望有人能帮忙:)
发布于 2022-10-04 23:12:56
第一个按日期计算的arrange (确保它是date类)。然后添加前面的数据,然后是renameing和relocateing列。
library(dplyr)
df %>%
group_by(StudentName) %>%
arrange(InvoiceDate, .by_group = TRUE) %>%
mutate(FollowUpCourse = lead(Course, default = "Stop"),
FollowUpCourseDate = lead(as.character(InvoiceDate), default = "Stop") ) %>%
rename(StartCourseDate = InvoiceDate, FullName = StudentName, StartCourse = Course) %>%
relocate(FullName, .before = StartCourseDate) %>%
ungroup()
# A tibble: 6 × 5
FullName StartCourseDate StartCourse FollowUpCourse FollowUpCourseDate
<chr> <date> <chr> <chr> <chr>
1 Mary 2018-10-15 Level 1 Level 2 2020-08-06
2 Mary 2020-08-06 Level 2 Level 2 2021-10-10
3 Mary 2021-10-10 Level 2 Stop Stop
4 Tom 2020-07-26 Level 1 Level 2 2020-11-05
5 Tom 2020-11-05 Level 2 Level 3 2021-11-05
6 Tom 2021-11-05 Level 3 Stop Stop数据
df <- structure(list(InvoiceDate = structure(c(18469, 18571, 18936,
17819, 18480, 18910), class = "Date"), StudentName = c("Tom",
"Tom", "Tom", "Mary", "Mary", "Mary"), Course = c("Level 1",
"Level 2", "Level 3", "Level 1", "Level 2", "Level 2")), row.names = c(NA,
-6L), class = "data.frame")发布于 2022-10-04 23:45:31
您可以通过mutating获得您的结果,并使用lead,使用across可以压缩您的变异调用。
library(dplyr)
df |>
mutate(Name = StudentName, StartCourseDate = InvoiceDate,
StartCourse = Course, across(c( Course, InvoiceDate), ~
ifelse(Course != "Level 3", lead(as.character(.x), default = "Stop"), "Stop"),
.names = "FollowUp{.col}"), .keep = "unused") Name StartCourseDate StartCourse FollowUpCourse FollowUpInvoiceDate
1 Tom 2020-07-26 Level 1 Level 2 2020-11-05
2 Tom 2020-11-05 Level 2 Level 3 2021-11-05
3 Tom 2021-11-05 Level 3 Stop Stop
4 Mary 2018-10-15 Level 1 Level 2 2020-08-06
5 Mary 2020-08-06 Level 2 Level 2 2021-10-10
6 Mary 2021-10-10 Level 2 Stop Stophttps://stackoverflow.com/questions/73954107
复制相似问题