首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在R中计算特定日期条件下的缺失值

在R中计算特定日期条件下的缺失值
EN

Stack Overflow用户
提问于 2021-12-31 00:53:29
回答 1查看 108关注 0票数 1

上下文

我正在帮助一位在教育领域工作的朋友,她在大流行期间为学生提供的工作项目之一就是一对一地辅导Zoom。他们的员工会把考勤记录在谷歌的单子上,但有时他们不可避免地会忘记记录出勤情况。为了帮助她解决这个问题,我试图找出一些R代码,当员工忘记了考勤时,这些代码就会被发现。我已经搜索并尝试过if_else命令、pivot_longer、for-循环等等,但我仍然是R的(功能)初学者,所以我读到的很多东西都超出了我的范围。(FYI是我所知道的唯一的编程语言,所以如果需要的话,解决方案必须是我将来可以实现和排除故障的语言)。尽管如此,对于没有包括可复制代码(因为我甚至不知道从哪里开始),事先表示歉意。但是对于任何有能力帮忙的人,以下是公开查看的Google工作表中的示例数据

问题

我需要查找和/或计数在之后发生的NA值,即辅导start_date,而不是,而不是计算将来发生的NAs。您通常可以判断员工何时忘记出席,因为:

  1. 在雇员记得要出席的几个星期之间
  2. 这些NAs发生在过去,但从来没有在辅导start date之前出现过

为了帮助形象化这一点,我用黄色突出显示了大多数(也许全部)符合这一标准的NAs。

由于这不是我的数据/我的项目,所以我不能仅仅因为这是一种糟糕的实践而改变它(例如,一周中的一天和时间在同一个单元格中)。但是,任何直接帮助我解决问题的解决方案都是非常感谢的。

最好的

詹姆斯

用于附加上下文的编辑

为了帮助监控出席情况,我正在创建一个柔性仪表板,目标是包括一个valueBox (或类似的东西),它填充了不应该包含NAs的辅导课程的数量。如果程序管理员通过仪表板知道有NAs,她可以尽快跟踪员工并让他们输入正确的代码(X、1、2等)。她现在面临的问题是,几天或几个星期过去了,员工可能会忘记会议是发生了还是有人迟到了。谢谢!

dput(fake_tutoring_data)

代码语言:javascript
复制
structure(list(start_date = structure(c(18893, 18897, 18898, 
18900, 18900, 18900, 18901, 18904, 18907, 18911, 18911, 18912, 
18913, 18919, 18919, 18925, 18925, 18933, 18933, 18934, 18935, 
18939, 18939, 18939, 18946, 18964, 18968, 18968), class = "Date"), 
    day_time = c("TH/7pm", "MON/5:30 PM", "TUE/6:15PM", "TH/9am", 
    "TH/6:30 PM", "TH/7pm", "F/5:15pm", "MON/4:30PM", "TH/6 pm", 
    "MON/ 5:00 PM", "MON/6:00 PM", "TUE/6:30 PM", "WED/6pm", 
    "TUE/11:00 AM", "TUE/2pm", "MON 4:45 PM", "TUE/6:00 PM", 
    "TUE/6:00 PM", "TUE/6:15PM", "WED/5:00 PM", "TH/6PM", "MON/5:30PM", 
    "MON/5:30 PM", "MON/6:00 PM", "MON/6:00 PM", "F/12pm", "Tue/ 4:30pm", 
    "Tue/5:00 pm"), `2021-09-20` = c("1", NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), `2021-09-27` = c(NA, "1", "1", 
    "1", "1", "1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), `2021-10-04` = c("X", 
    "1", "1", "1", "1", "X", NA, "1", "1", NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), 
    `2021-10-11` = c("X", "1", "1", "1", "1", "1", "1", "1", 
    "4", "1", "1", "1", "1", NA, "4", NA, NA, NA, NA, NA, NA, 
    NA, NA, NA, NA, NA, NA, NA), `2021-10-18` = c("X", "1", "1", 
    "1", "X", "1", "1", "1", "4", "3", "X", "1", "2", NA, "1", 
    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), `2021-10-25` = c("1", 
    "1", "1", "1", "1", "4/5", "1", "1", "2", "X", "1", "4", 
    "1", "1", "X", "1", "1", NA, NA, NA, NA, NA, NA, NA, NA, 
    NA, NA, NA), `2021-11-01` = c("1", "1", "1", "1", "1", NA, 
    "1", "4", "1", "X", "1", "1", NA, "1", "1", "4", "1", "1", 
    "1", "1", "1", NA, NA, NA, NA, NA, NA, NA), `2021-11-08` = c("1", 
    "1", "3", NA, "1", NA, "1", "X", NA, "5", NA, "4", "2", "4", 
    "1", "1", "1", "1", "1", "1", "1", "1", NA, NA, NA, NA, NA, 
    NA), `2021-11-15` = c("1", "1", NA, "1", "1", "1", "1", "3", 
    "2", "3", NA, "2", "4", "2", NA, "2", "1", "1", NA, NA, "1", 
    "2", "1", "1", "1", NA, NA, NA), `2021-11-22` = c(NA, "1", 
    "1", NA, NA, NA, "1", "X", NA, NA, NA, NA, NA, "5", "X", 
    "4", "4", "1", "4", NA, "1", "1", "2", "X", NA, NA, NA, NA
    ), `2021-11-29` = c("1", "1", NA, "1", "1", "X", "1", NA, 
    "2", NA, NA, "1", NA, "1", "2", "X", "1", "1", NA, "1", "4", 
    "1", "1", "1", "1", "4", NA, NA), `2021-12-06` = c("1", NA, 
    "1", NA, NA, NA, "5", NA, "1", NA, NA, "1", "1", NA, NA, 
    "X", "3", "1", NA, "2", "5", NA, "1", "1", "1", "4", "1", 
    "1"), `2021-12-13` = c(NA, "1", NA, NA, "1", NA, NA, NA, 
    NA, NA, NA, "5", "5", NA, NA, NA, NA, "1", "X", "2", NA, 
    NA, "5", "1", "4", NA, NA, "1"), `2021-12-20` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2021-12-27` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-01-03` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-01-10` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-01-17` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-01-24` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-01-31` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-02-07` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-02-14` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-02-21` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-02-28` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-03-07` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-03-14` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-03-21` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-03-28` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-04-04` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-04-11` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-04-18` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-04-25` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-05-02` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-05-09` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-05-16` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-05-23` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-05-30` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_), `2022-06-06` = c(NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_)), row.names = c(NA, 
-28L), class = c("tbl_df", "tbl", "data.frame"))
EN

回答 1

Stack Overflow用户

发布于 2021-12-31 03:16:23

在这种情况下,旋转数据是可行的。我会这么做的。首先,将fake_tutoring_data从宽格式转换为长格式:将每一行替换为每个日期列对应的一行,同时保留start_dateday_time标识符。我使用一个正则表达式来标识日期列;它们的格式为YYYY。(我还添加了一个id列,它只是行号;这在以后会派上用场。)

代码语言:javascript
复制
library(tidyverse)
library(lubridate)

long_tutoring_data = fake_tutoring_data %>%
  mutate(id = row_number()) %>%
  pivot_longer(cols = matches("^[0-9]{4}-[0-9]{2}-[0-9]{2}$"),
               names_to = "attendance_date",
               values_to = "attendance") %>%
  mutate(attendance_date = ymd(attendance_date))

接下来,使用三个条件查找缺失的考勤记录:如果记录是

  1. 在今天之前,
  2. 在相应的start_date上或之后,以及
  3. 失踪了。
代码语言:javascript
复制
missing_attendance = long_tutoring_data %>%
  filter(attendance_date < Sys.Date(),
         attendance_date >= start_date,
         is.na(attendance))

我还没有检查这是否与您在电子表格中标记的NAs完全对应,但是一些抽查表明它非常接近。

你建议的另一个标准--一个缺失的记录是一个按时间顺序排列在两个没有丢失的记录之间的记录--是比较棘手的,但却是可行的。我添加了两个列,previous_attendancenext_attendance,它们包含这个id的上一个(或下一个)非空考勤记录。然后,我们可以找到至少有一个之前和之后的不丢失记录的缺失记录。

代码语言:javascript
复制
missing_attendance = long_tutoring_data %>%
  group_by(id) %>%
  arrange(id, attendance_date) %>%
  mutate(previous_attendance = attendance,
         next_attendance = attendance) %>%
  fill(previous_attendance, .direction = "down") %>%
  fill(next_attendance, .direction = "up") %>%
  ungroup() %>%
  filter(is.na(attendance),
         !is.na(previous_attendance),
         !is.na(next_attendance))

这并不能像第一种方法那样识别丢失的记录;您将是判断哪种方法更准确的最佳方法。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70538059

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档