我在处理处方药索赔数据集。当有一个取消的索赔时,数据系统不只是删除观察值,而是使用相同的处方编号创建一个新的观察值,但提供的天数显示为负数。
例如。
DaysSupply RxNumber DateSupplied
1 -10 1 2018
2 10 1 2018如果1)共享相同的处方编号(RxNumber),2)如果它们具有相同的处方日期(DateSupplied),以及3)如果DaysSupply是相应的正值和负值(例如+10和-10),我希望删除数据集的成对行。在本例中,处方编号是特定于患者的密钥。
一个复杂的问题是,可以从一个处方编号兑换多个药物填充,因此我希望仅对符合上述条件的配对进行重复数据删除,而不是对共享相同处方编号的所有行进行重复数据删除。
我不确定我应该采取什么方法。我考虑过使用一个很长的if语句/dedpulicate命令,但我不确定如何指示R仅对符合上述条件的对进行重复数据删除。
v1 <- c(-10,10,10,-8,8,-6,6,5,4)
v2 <- c(1,1,1,2,2,3,4,9,9)
v3 <- c(2018, 2018, 2018, 2018, 2017, 2016, 2016, 2015, 2014)
df <- data.frame("DaysSupply" = v1, "RxNumber" = v2, "DateSupplied" = v3)
DaysSupply RxNumber DateSupplied
1 -10 1 2018
2 10 1 2018
3 10 1 2018
4 -8 2 2018
5 8 2 2017
6 -6 3 2016
7 6 4 2016
8 5 9 2015
9 4 9 2014我想要的输出是:
DaysSupply RxNumber DateSupplied
3 10 1 2018
4 -8 2 2018
5 8 2 2017
6 -6 3 2016
7 6 4 2016
8 5 9 2015
9 4 9 2014有什么想法吗?
发布于 2019-05-03 02:07:02
使用示例数据的dplyr解决方案。
我在结尾处添加了一些行,以使其看起来更美观,并使输出看起来与您的相同。我相信有人可以删掉一两行代码,让删除重复的过程变得更干净一些,但我做到了你所需要的。
df %>%
dplyr::mutate(AbsDaysSupply = abs(DaysSupply)) %>%
dplyr::group_by(RxNumber, DateSupplied, AbsDaysSupply) %>%
dplyr::arrange(RxNumber, DateSupplied, AbsDaysSupply, DaysSupply) %>%
dplyr::mutate(sum = cumsum(DaysSupply)) %>%
dplyr::filter(!(sum <= 0 & dplyr::n() > 1)) %>%
dplyr::ungroup() %>%
dplyr::select(-AbsDaysSupply, -sum) %>%
dplyr::arrange(desc(DateSupplied), RxNumber)
# A tibble: 7 x 3
DaysSupply RxNumber DateSupplied
<dbl> <dbl> <dbl>
1 10 1 2018
2 -8 2 2018
3 8 2 2017
4 -6 3 2016
5 6 4 2016
6 5 9 2015
7 4 9 2014发布于 2019-05-03 02:40:46
library(tidyverse)
v1 <- c(-10,10,10,-8,8,-6,6,5,4)
v2 <- c(1,1,1,2,2,3,4,9,9)
v3 <- c(2018, 2018, 2018, 2018, 2017, 2016, 2016, 2015, 2014)
df <- data.frame("DaysSupply" = v1, "RxNumber" = v2, "DateSupplied" = v3)
df %>%
# Create an absolute column for matching
mutate(DaysSupplyAbs = abs(DaysSupply)) %>%
# Orderto make matches adjacent, but with the positive first
arrange(RxNumber, DaysSupplyAbs, -DaysSupply) %>%
# Limit matches to Year and RxNumber
group_by(RxNumber, DateSupplied) %>%
# Get the nex (lead) and prior(Days Supply values)
mutate(DaysSupplyLead = lead(DaysSupply),
DaysSupplyLag = lag(DaysSupply)) %>%
# Identify the reversed and reversal
mutate(reversed = if_else(is.na(DaysSupplyLead), FALSE, DaysSupply == -DaysSupplyLead)) %>%
mutate(reversal = if_else(is.na(lag(reversed)), FALSE, lag(reversed) )) %>%
ungroup() %>%
# Filter out the reversals and the reveresed
filter(!(reversed | reversal)) %>%
select(DaysSupply, RxNumber, DateSupplied, reversed, reversal )结果:
# DaysSupply RxNumber DateSupplied reversed reversal
# <dbl> <dbl> <dbl> <lgl> <lgl>
# 1 10 1 2018 FALSE FALSE
# 2 8 2 2017 FALSE FALSE
# 3 -8 2 2018 FALSE FALSE
# 4 -6 3 2016 FALSE FALSE
# 5 6 4 2016 FALSE FALSE
# 6 4 9 2014 FALSE FALSE
# 7 5 9 2015 FALSE FALSEhttps://stackoverflow.com/questions/55956679
复制相似问题