我有一个很大的数据集要总结。这些数据是健康记录,每个人都检查过许多器官/组织,诊断是以叙述的形式输入的。我有几个关键的诊断术语想要找到,然后我想知道哪些器官与诊断有关。
示例(所有条目转换为字符串)
dataframe1
Organ Diagnosis
lungs interstitial pneumonia
liver hepatic congestion ; diffuse
cerebrum traumatic disruption and hemorrhage
adrenal gland focal hemorrhagedataframe2
Keywords
congestion
hemorrhage
trauma
pneumonia我想在dataframe1$Diagnosis中搜索与dataframe2$Keywords匹配的字符串,对于每个匹配项,返回在dataframe1$Organ的相应行中输入的器官。
数据结构
dataframe1 <- structure(list(Organ = c("lungs", "liver", "cerebrum", "adrenal gland"
), Diagnosis = c("interstitial pneumonia", "hepatic congestion ; diffuse",
"traumatic disruption and hemorrhage", "focal hemorrhage")), .Names = c("Organ",
"Diagnosis"), class = "data.frame", row.names = c(NA, -4L))
dataframe2 <- data.frame(Keywords=c("congestion","hemorrhage","trauma","pneumonia"),stringsAsFactors=FALSE)发布于 2016-07-13 11:28:15
我们可以使用grep
sapply(dataframe2$Keywords, function(x)
toString(trimws(dataframe1[,1][grep(x, dataframe1[,2])])))发布于 2016-07-13 12:03:21
我认为返回匹配内容的堆叠列表可能很有价值,例如:
stack(
sapply(dataframe2$Keywords,
function(x) dataframe1$Organ[grepl(x, dataframe1$Diagnosis)])
)
# values ind
#1 liver congestion
#2 cerebrum hemorrhage
#3 adrenal gland hemorrhage
#4 cerebrum trauma
#5 lungs pneumoniahttps://stackoverflow.com/questions/38341585
复制相似问题