在从文本中学习本体的情况下,假设我有两个概念,并且我对它们之间的关系感兴趣:
class <- c(animal.class, dog.class)
individual <- "Snoopy"
animal.class <- c("animal", "animals")
dog.class <- c("dog", "dogs")
sentence1 <- "Snoopy is an animal."
sentence2 <- "Snoopy is a dog."如何提取与R的语言上下文和语义关系,以便我可以收集数据帧,而不需要之前定义的上下文/关系("is a(n)")。
data.frame(CLASS1="animal",CLASS2="Snoopy",context="CLASS2 is an CLASS1")
data.frame(CLASS1="dog",CLASS2="Snoopy",context="CLASS2 is a CLASS1")用其他工具很容易提取这种东西,比如有限状态转换器,但我想留在R中,目前我在R中找不到任何类似的东西。
我设想了一些使用perl regex以及tm和stringr包的解决方案...它们足够了吗?
发布于 2015-04-07 03:56:14
我不太确定你在找什么。这就是我认为你要做的事情:
sentences <- c(
"Snoopy is an animal.",
"Snoopy is a dog.",
"Snoopy likes chocolate!",
"Goofy is a dog"
)
if (!require("pacman")) install.packages("pacman")
pacman::p_load(qdapRegex, dplyr, tidyr)
(out <- rm_default(sentences, pattern = S("@around_", 1, "is a(n*)", 1), extract=TRUE) %>%
unlist %>%
sub("\\s+", "<SPLIT>", .) %>%
data_frame(new = .) %>%
na.omit %>%
separate(new, c("CLASS2", "context", "CLASS1"), sep = "(<SPLIT>)|( (?=[^ ]+$))") %>%
mutate(context = sprintf("CLASS 2 %s CLASS 1", context)) %>%
select(c(1, 3, 2)))
## CLASS2 CLASS1 context
## 1 Snoopy animal CLASS 2 is an CLASS 1
## 2 Snoopy dog CLASS 2 is a CLASS 1
## 3 Goofy dog CLASS 2 is a CLASS 1然后,要提取各种CLASSes的特定实例,请在管道末尾使用filter:
out %>%
filter(grepl("[Ss]noopy", CLASS2))
## CLASS2 CLASS1 context
## 1 Snoopy animal CLASS 2 is an CLASS 1
## 2 Snoopy dog CLASS 2 is a CLASS 1
out %>%
filter(grepl("[Dd]og", CLASS1))
## CLASS2 CLASS1 context
## 1 Snoopy dog CLASS 2 is a CLASS 1
## 2 Goofy dog CLASS 2 is a CLASS 1https://stackoverflow.com/questions/29464521
复制相似问题