首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在R中具有完整NAs的dplyr组/变量对上跳过na_interpolation

在R中具有完整NAs的dplyr组/变量对上跳过na_interpolation
EN

Stack Overflow用户
提问于 2020-05-21 23:53:07
回答 2查看 108关注 0票数 2

我有一个数据框,看起来像这样:

代码语言:javascript
复制
   Country Year acnt_class     wages
3      AZE 2010         NA        NA
4      AZE 2011  0.4206776        NA
5      AZE 2012         NA        NA
6      AZE 2013         NA        NA
7      AZE 2014  0.7735889 0.4273174
8      AZE 2015         NA        NA
9      AZE 2016         NA        NA
10     AZE 2017  0.5108674 0.4335978
11     AZE 2018         NA        NA
15     BDI 2010         NA        NA
16     BDI 2011  0.3140646        NA
17     BDI 2012         NA        NA
18     BDI 2013         NA        NA
19     BDI 2014  0.1224175        NA
20     BDI 2015         NA        NA
21     BDI 2016         NA        NA
22     BDI 2017         NA        NA
23     BDI 2018         NA        NA
27     BEL 2010         NA        NA
28     BEL 2011  0.9576057        NA
29     BEL 2012         NA        NA
30     BEL 2013         NA        NA
31     BEL 2014  1.0083120 0.9623492
32     BEL 2015         NA        NA
33     BEL 2016         NA        NA
34     BEL 2017  1.0036910 0.9499486
35     BEL 2018         NA        NA

我正在尝试运行此函数,以使用stine插值在变量列"acnt_class“和”wages“之间按组填充缺少的NAs:

代码语言:javascript
复制
DF <- DF %>% 
  group_by(Country) %>% 
  mutate_at(.vars = c("acnt_class", "wages"), 
            .funs = ~na_interpolation(., option = "stine")) 

当我在每组至少有两个观察值的列上运行它时,它就会工作,然而,在这里,我遇到了这个错误:

代码语言:javascript
复制
Error in na_interpolation(., option = "stine") : 
  Input data needs at least 2 non-NA data point for applying na_interpolation

这是由于组"BDI“具有用于可变”工资“的完整的NAs。

理想情况下,我正在寻找一个经过修改的函数,它将“跳过”具有完整NAs/1观察的组/变量对,并让它们保持原样。解决方案?谢谢!

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-05-22 01:25:57

找到了解决方案:

仅用于插值:

代码语言:javascript
复制
library(TSimpute)
library(dplyr)
library(zoo)

DF <- DF %>% 
  group_by(Country) %>% 
  mutate_at(vars(acnt_class, wages), funs(if(sum(!is.na(.))<2) {.} else{replace(na_interpolation(., option = "stine"), is.na(na.approx(., na.rm=FALSE)), NA)}))
票数 2
EN

Stack Overflow用户

发布于 2020-07-26 06:26:06

TiberiusGracchus2020提供的答案运行良好。如果对任何人有帮助,我已经将代码片段转换为一个带有大量注释的函数,以使每个阶段发生的事情变得更加清晰。

代码语言:javascript
复制
# Modify imputeTS::na_interpolate function
#   (1) doesn't break on all NA vectors
#   (2) won't impute leading and lagging NAs

na_interpolation2 <- function(x, option = "linear") {
  library(TSimpute)
  library(dplyr)

  total_not_missing <- sum(!is.na(x))
  
  # check there is sufficient data for na_interpolation 
  if(total_not_missing < 2) {x} 

    else

    # replace takes an input vector, a T/F vector & replacement value
    {replace(
        # input vector is interpolated data
        # this will impute leading/lagging NAs which we don't want 
        imputeTS::na_interpolation(x, option = option), 

        # create T/F vector for NAs,  
        is.na(na.approx(x, na.rm = FALSE)), 

        # replace TRUE with NA in input vector  
        NA) 
      }
}

# example data
data1 <- c(NA, NA, NA, NA, NA) 
data2 <- c(NA, NA, 1, NA, 3, NA)

na_interpolation(data1)
# Error in na_interpolation(data1) : Input data needs at 
# least 2 non-NA data point for applying na_interpolation

na_interpolation(data2)
# [1] 1 1 1 2 3 3

na_interpolation2(data1)
# [1] NA NA NA NA NA

na_interpolation2(data2)
# [1] NA NA  1  2  3 NA
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61938492

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档