我的数据框的第一列(脚注)包含脚注。
我创建了字符串列表来标识它们的类型,例如:
law <- c("Directive", "Commission Decision", "TFEU",
"TEU", "OJ L", "OJ C", "Case C-", "CJEU", "Council Decision",
"Official Journal", "(EU)", "(EEC)", "legal basis",
"Commission Regulation", "Article", "Regulation", "(EC)",
"Legislative framework", "Treaty", "Resolution", "Convention",
"Judgement of", "Ordinance", "Decision", "Paris Agreement",
"Law", "Art.", "legislation", "Charter of", "AGRILEG", "REACH")当尝试使用str_detect分别查找每个单词时,它是有效的。但是,我想问一下是否存在列表中的任何元素,以便在新列(称为法律)中打印"TRUE“
使用正则表达式(字符串之间的“|”)不起作用。我没有收到错误消息,但我手动检查了一下,尽管列表中的字符串没有出现在脚注中,但实际上我在任何地方都得到了正确的结果。
我尝试为列表中的每个单词单独创建一个新列,但之后无法将数据帧导出为excel格式。我的想法是过滤列法--> LAW12,以便将响应合并到1列中,但我也找不到这样做的方法。
我认为第一个想法会更快,但我对如何实现它一无所知。
DATABASA_V6$LAW <- str_detect(DATABASE_V6$FOOTNOTES,"[Directive|Decision|TFEU|OJ L]")
DATABASA_V6$LAW <- str_detect(DATABASE_V6$FOOTNOTES, "OJ L")
DATABASE_V6$LAW1 <- str_detect(DATABASE_V6$FOOTNOTES, "Regulation")
DATABASE_V6$LAW2 <- str_detect(DATABASE_V6$FOOTNOTES, "Directive")
DATABASE_V6$LAW3 <- str_detect(DATABASE_V6$FOOTNOTES, "TFEU")
DATABASE_V6$LAW4 <- str_detect(DATABASE_V6$FOOTNOTES, "TEU")
DATABASE_V6$LAW5 <- str_detect(DATABASE_V6$FOOTNOTES, "Legal basis")
DATABASE_V6$LAW6 <- str_detect(DATABASE_V6$FOOTNOTES, "Official Journal")
DATABASE_V6$LAW7 <- str_detect(DATABASE_V6$FOOTNOTES, "Case C-")
DATABASE_V6$LAW8 <- str_detect(DATABASE_V6$FOOTNOTES, "Decision")
DATABASE_V6$LAW9 <- str_detect(DATABASE_V6$FOOTNOTES, "Resolution")
DATABASE_V6$LAW10 <- str_detect(DATABASE_V6$FOOTNOTES, "Article")
DATABASE_V6$LAW11 <- str_detect(DATABASE_V6$FOOTNOTES, "Treaty")
DATABASE_V6$LAW12 <- str_detect(DATABASE_V6$FOOTNOTES, "Convention")当尝试识别列表中的任何单词是否出现在列脚注中时,我希望在14,000行中得到大约2000个TRUE。
发布于 2019-09-12 02:48:36
您已经掌握了大部分解决方案。正如Joran和r2evans上面解释的,你不需要方括号。您还可以使用paste(law, collapse = "|")在一个步骤中为正则表达式设置字符串列表的格式。
law <- c("Directive", "Commission Decision", "TFEU",
"TEU", "OJ L", "OJ C", "Case C-", "CJEU", "Council Decision",
"Official Journal", "(EU)", "(EEC)", "legal basis",
"Commission Regulation", "Article", "Regulation", "(EC)",
"Legislative framework", "Treaty", "Resolution", "Convention",
"Judgement of", "Ordinance", "Decision", "Paris Agreement",
"Law", "Art.", "legislation", "Charter of", "AGRILEG", "REACH")
law_formatted <- paste0(law, collapse = "|")
tst <- data.frame(footnote = c("footnote footnote OJ L footnote footnote",
"blah blah (EU) blah",
"nothing to see here",
"words words. words words words Art."))
tst$law <- stringr::str_detect(tst$footnote, law_formatted)https://stackoverflow.com/questions/57878128
复制相似问题