我有一个类似于以下内容的数据集:
Age Monday Tuesday Wednesday
6-9 a b a
6-9 b b c
6-9 c a
9-10 c c b
9-10 c a b使用R,我想要一个二进制变量来表示整个行是否包含"a“(1表示完整的a,0表示不包含),如下所示:
Age Monday Tuesday Wednesday Entire a
6-9 a a 1
6-9 b b c 0
6-9 c a 0
9-10 c c b 0
9-10 a a a 1注意:我的数据还包含行中缺少的值。我感兴趣的栏目是“要素”。我使用了以下代码,但是这些代码并不起作用:
L <- dataframe %>%
select(Age,Monday:Wednesday) %>%
mutate (Entire a = ifelse(c(Monday:Wednesday)=="a",1,0,na.rm=TRUE))发布于 2020-05-17 06:52:38
我会使用dplyr解决方案:
library(dplyr)
my.data <- data.frame(
age = c("6-9", "6-9", "6-9", "9-10", "9-10", "9-10"),
Monday = c("a", "b", NA, "c", "a", "a"),
Tuesday = c("a", "b", "a", "c", "a", NA),
Wednesday = c("a", "c", "a", "c", "a", NA)
)
my.data %>%
mutate(
`Entire a` = apply(.[, 2:4], 1, function(x) all(x == "a", na.rm = T) %>% as.numeric)
)
# age Monday Tuesday Wednesday Entire a
# 1 6-9 a a a 1
# 2 6-9 b b c 0
# 3 6-9 <NA> a a 1
# 4 9-10 c c c 0
# 5 9-10 a a a 1
# 6 9-10 a <NA> <NA> 1all()函数中的na.rm参数将控制是否忽略缺少的值。
发布于 2020-05-17 06:47:24
我们可以使用==创建一个逻辑矩阵,并将rowSums转换为binary
colnm <- names(dataframe)[-1]
dataframe$Entire_a <- +(rowSums(replace(dataframe[colnm],
dataframe[colnm] == '', 'a') == 'a') == length(colnm))
dataframe$Entire_a
#[1] 1 0 0 0 1或者,另一种选择是使用paste,然后使用grep
+(grepl("^a+$", do.call(paste, c(dataframe[colnm], sep=""))))
#[1] 1 0 0 0 1如果缺少的值为NA且不为空(''),则使用
+(rowSums(replace(dataframe[colnm], is.na(dataframe[colnm]), 'a') == 'a') == 3)数据
dataframe <- structure(list(Age = c("6-9", "6-9", "6-9", "9-10", "9-10"),
Monday = c("a", "b", "", "c", "a"), Tuesday = c("", "b",
"c", "c", "a"), Wednesday = c("a", "c", "a", "b", "a")),
row.names = c(NA,
-5L), class = "data.frame")发布于 2020-05-17 10:54:39
我们可以使用purrr中的pmap_int来执行这个逐行操作。
如果空值('')尚未设置,请将其设置为NA。
library(dplyr)
library(purrr)
dataframe %>%
na_if('') %>%
mutate(Entire_a = pmap_int(select(., Monday:Wednesday),
~+all(c(...) == 'a', na.rm = TRUE)))
# Age Monday Tuesday Wednesday Entire_a
#1 6-9 a <NA> a 1
#2 6-9 b b c 0
#3 6-9 <NA> c a 0
#4 9-10 c c b 0
#5 9-10 a a a 1https://stackoverflow.com/questions/61844547
复制相似问题