上下文
我正在帮助一位在教育领域工作的朋友,她在大流行期间为学生提供的工作项目之一就是一对一地辅导Zoom。他们的员工会把考勤记录在谷歌的单子上,但有时他们不可避免地会忘记记录出勤情况。为了帮助她解决这个问题,我试图找出一些R代码,当员工忘记了考勤时,这些代码就会被发现。我已经搜索并尝试过if_else命令、pivot_longer、for-循环等等,但我仍然是R的(功能)初学者,所以我读到的很多东西都超出了我的范围。(FYI是我所知道的唯一的编程语言,所以如果需要的话,解决方案必须是我将来可以实现和排除故障的语言)。尽管如此,对于没有包括可复制代码(因为我甚至不知道从哪里开始),事先表示歉意。但是对于任何有能力帮忙的人,以下是公开查看的Google工作表中的示例数据。
问题
我需要查找和/或计数在之后发生的NA值,即辅导start_date,而不是,而不是计算将来发生的NAs。您通常可以判断员工何时忘记出席,因为:
start date之前出现过为了帮助形象化这一点,我用黄色突出显示了大多数(也许全部)符合这一标准的NAs。
由于这不是我的数据/我的项目,所以我不能仅仅因为这是一种糟糕的实践而改变它(例如,一周中的一天和时间在同一个单元格中)。但是,任何直接帮助我解决问题的解决方案都是非常感谢的。
最好的
詹姆斯
用于附加上下文的编辑
为了帮助监控出席情况,我正在创建一个柔性仪表板,目标是包括一个valueBox (或类似的东西),它填充了不应该包含NAs的辅导课程的数量。如果程序管理员通过仪表板知道有NAs,她可以尽快跟踪员工并让他们输入正确的代码(X、1、2等)。她现在面临的问题是,几天或几个星期过去了,员工可能会忘记会议是发生了还是有人迟到了。谢谢!
dput(fake_tutoring_data)
structure(list(start_date = structure(c(18893, 18897, 18898,
18900, 18900, 18900, 18901, 18904, 18907, 18911, 18911, 18912,
18913, 18919, 18919, 18925, 18925, 18933, 18933, 18934, 18935,
18939, 18939, 18939, 18946, 18964, 18968, 18968), class = "Date"),
day_time = c("TH/7pm", "MON/5:30 PM", "TUE/6:15PM", "TH/9am",
"TH/6:30 PM", "TH/7pm", "F/5:15pm", "MON/4:30PM", "TH/6 pm",
"MON/ 5:00 PM", "MON/6:00 PM", "TUE/6:30 PM", "WED/6pm",
"TUE/11:00 AM", "TUE/2pm", "MON 4:45 PM", "TUE/6:00 PM",
"TUE/6:00 PM", "TUE/6:15PM", "WED/5:00 PM", "TH/6PM", "MON/5:30PM",
"MON/5:30 PM", "MON/6:00 PM", "MON/6:00 PM", "F/12pm", "Tue/ 4:30pm",
"Tue/5:00 pm"), `2021-09-20` = c("1", NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), `2021-09-27` = c(NA, "1", "1",
"1", "1", "1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), `2021-10-04` = c("X",
"1", "1", "1", "1", "X", NA, "1", "1", NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA),
`2021-10-11` = c("X", "1", "1", "1", "1", "1", "1", "1",
"4", "1", "1", "1", "1", NA, "4", NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), `2021-10-18` = c("X", "1", "1",
"1", "X", "1", "1", "1", "4", "3", "X", "1", "2", NA, "1",
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), `2021-10-25` = c("1",
"1", "1", "1", "1", "4/5", "1", "1", "2", "X", "1", "4",
"1", "1", "X", "1", "1", NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA), `2021-11-01` = c("1", "1", "1", "1", "1", NA,
"1", "4", "1", "X", "1", "1", NA, "1", "1", "4", "1", "1",
"1", "1", "1", NA, NA, NA, NA, NA, NA, NA), `2021-11-08` = c("1",
"1", "3", NA, "1", NA, "1", "X", NA, "5", NA, "4", "2", "4",
"1", "1", "1", "1", "1", "1", "1", "1", NA, NA, NA, NA, NA,
NA), `2021-11-15` = c("1", "1", NA, "1", "1", "1", "1", "3",
"2", "3", NA, "2", "4", "2", NA, "2", "1", "1", NA, NA, "1",
"2", "1", "1", "1", NA, NA, NA), `2021-11-22` = c(NA, "1",
"1", NA, NA, NA, "1", "X", NA, NA, NA, NA, NA, "5", "X",
"4", "4", "1", "4", NA, "1", "1", "2", "X", NA, NA, NA, NA
), `2021-11-29` = c("1", "1", NA, "1", "1", "X", "1", NA,
"2", NA, NA, "1", NA, "1", "2", "X", "1", "1", NA, "1", "4",
"1", "1", "1", "1", "4", NA, NA), `2021-12-06` = c("1", NA,
"1", NA, NA, NA, "5", NA, "1", NA, NA, "1", "1", NA, NA,
"X", "3", "1", NA, "2", "5", NA, "1", "1", "1", "4", "1",
"1"), `2021-12-13` = c(NA, "1", NA, NA, "1", NA, NA, NA,
NA, NA, NA, "5", "5", NA, NA, NA, NA, "1", "X", "2", NA,
NA, "5", "1", "4", NA, NA, "1"), `2021-12-20` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2021-12-27` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-01-03` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-01-10` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-01-17` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-01-24` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-01-31` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-02-07` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-02-14` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-02-21` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-02-28` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-03-07` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-03-14` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-03-21` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-03-28` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-04-04` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-04-11` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-04-18` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-04-25` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-05-02` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-05-09` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-05-16` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-05-23` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-05-30` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), `2022-06-06` = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_)), row.names = c(NA,
-28L), class = c("tbl_df", "tbl", "data.frame"))发布于 2021-12-31 03:16:23
在这种情况下,旋转数据是可行的。我会这么做的。首先,将fake_tutoring_data从宽格式转换为长格式:将每一行替换为每个日期列对应的一行,同时保留start_date和day_time标识符。我使用一个正则表达式来标识日期列;它们的格式为YYYY。(我还添加了一个id列,它只是行号;这在以后会派上用场。)
library(tidyverse)
library(lubridate)
long_tutoring_data = fake_tutoring_data %>%
mutate(id = row_number()) %>%
pivot_longer(cols = matches("^[0-9]{4}-[0-9]{2}-[0-9]{2}$"),
names_to = "attendance_date",
values_to = "attendance") %>%
mutate(attendance_date = ymd(attendance_date))接下来,使用三个条件查找缺失的考勤记录:如果记录是
start_date上或之后,以及missing_attendance = long_tutoring_data %>%
filter(attendance_date < Sys.Date(),
attendance_date >= start_date,
is.na(attendance))我还没有检查这是否与您在电子表格中标记的NAs完全对应,但是一些抽查表明它非常接近。
你建议的另一个标准--一个缺失的记录是一个按时间顺序排列在两个没有丢失的记录之间的记录--是比较棘手的,但却是可行的。我添加了两个列,previous_attendance和next_attendance,它们包含这个id的上一个(或下一个)非空考勤记录。然后,我们可以找到至少有一个之前和之后的不丢失记录的缺失记录。
missing_attendance = long_tutoring_data %>%
group_by(id) %>%
arrange(id, attendance_date) %>%
mutate(previous_attendance = attendance,
next_attendance = attendance) %>%
fill(previous_attendance, .direction = "down") %>%
fill(next_attendance, .direction = "up") %>%
ungroup() %>%
filter(is.na(attendance),
!is.na(previous_attendance),
!is.na(next_attendance))这并不能像第一种方法那样识别丢失的记录;您将是判断哪种方法更准确的最佳方法。
https://stackoverflow.com/questions/70538059
复制相似问题