我正在从非常杂乱的原始文件中构建一个数据集,并使用testthat来确保在添加新数据或更正清理规则时不会中断。我想添加一个测试来查看数据中是否有任何NA值,如果有,则报告它们在哪些列中。
通过为每一列编写一个测试来手动完成这项工作是微不足道的。但是这种解决方案维护起来很麻烦,而且容易出错,因为我不想每次在数据集中添加或删除列时都要更新test-NA文件。
下面是我所拥有的示例代码
df <- tidyr::tribble(
~A, ~B, ~C,
1, 2, 3,
NA, 2, 3,
1, 2, NA
)
# checks all variables, doesn't report which have NA values
testthat::test_that("NA Values", {
testthat::expect_true(sum(is.na(df)) == 0)
})
# Checks each column, but is a pain to maintain
testthat::test_that("Variable specific checks", {
testthat::expect_true(sum(is.na(df$A)) == 0)
testthat::expect_true(sum(is.na(df$B)) == 0)
testthat::expect_true(sum(is.na(df$C)) == 0)
})发布于 2020-02-18 05:37:05
解决方案1:快速且(不是那么)脏
df <- tidyr::tribble(
~A, ~B, ~C,
1, 2, 3,
NA, 2, 3,
1, 2, NA
)
# Checks each column, but is a pain to maintain
testthat::test_that("Variable specific checks", {
res <- apply(df, 2, function(x) sum(is.na(x))>0)
testthat::expect_true(all(res), label = paste(paste(which(res), collapse=", "), "contain(s) NA(s)"))
})它应该返回
Error: Test failed: 'Variable specific checks'
* 1, 3 contain(s) NA isn't true.解决方案2:根据需要修改expect_()函数
expect_true2 <- function(object, info = NULL, label = NULL) {
act <- testthat::quasi_label(rlang::enquo(object), label, arg = "object")
testthat::expect(identical(as.vector(act$val), TRUE), sprintf("Column %s contain(s) NA(s).",
act$lab), info = info)
invisible(act$val)
}
testthat::test_that("Variable specific checks", {
res <- apply(df, 2, function(x) sum(is.na(x))>0)
expect_true2(all(res), label = paste(which(res), collapse=","))
})它应该返回
Error: Test failed: 'Variable specific checks'
* Column 1,3 contain(s) NA(s).https://stackoverflow.com/questions/60269017
复制相似问题