首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >两列之间的变量子字符串匹配

两列之间的变量子字符串匹配
EN

Stack Overflow用户
提问于 2017-03-04 09:12:07
回答 1查看 58关注 0票数 0

我有一个包含20,000行的数据集,其最纯粹的形式如下所示:

代码语言:javascript
复制
    v1                   v2
1   Case 1 (A v. B)      A v. B 
2   Case 2 (A v. C)      A v. B 
3   Case 2 (A v. C)      C v. B 
4   Case 4 (X v. Z)      X v. Z 
5   Case 5 (B v. A)      A v. B 
6   Case 6 (X v. A)      X v. A 
7   Case 6 (X v. A)      A v. X 
...

...except有许多v1,v2的变种(实际上大约在150个左右,但仍有太多的变体无法列出)。

我希望返回第三列v3,其中包含一个逻辑指示符,说明v1的任何子字符串是否与v2中的字符串匹配。

代码语言:javascript
复制
    v1                   v2           v3
1   Case 1 (A v. B)      A v. B       TRUE
2   Case 2 (A v. C)      A v. B       FALSE
3   Case 2 (A v. C)      C v. B       FALSE
4   Case 4 (X v. Z)      X v. Z       TRUE
5   Case 5 (B v. A)      A v. B       FALSE
6   Case 6 (X v. A)      X v. A       TRUE
7   Case 6 (X v. A)      A v. X       FALSE

我一直在玩这样的事情,我认为这是正确的:

代码语言:javascript
复制
library(stringr)
x$v3 <- with(x, str_detect(v1, v2))

如果有人能为我指出正确的解决方案/解决办法,我将非常感激。

MWE显示了我的str_detect()技术不起作用:

代码语言:javascript
复制
x <- structure(list(v1 = c("Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation", 
                          "Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia  v  Russian Federation"
), v2 = c("Georgia v Russian Federation", " Ethiopia v South Africa Liberia v South Africa", 
             " Cameroon v United Kingdom", " New Zealand v France", " Australia v France", 
             " Nicaragua v United States of America", " Nicaragua v Honduras", 
             " Nauru v Anustralia", " Nnew Zealand v France", " Islamic Republic of Iran v United States of America", 
             " Bosnia and Herzegovina v Serbia and Montenegro", " Spain v Cananda", 
             " Libyan Arab Jamahiriya v United States of America", " Libyan Arab Jamahiriya v United Kingdom", 
             " Democratic Republic of the Congo v Burundi", " Germany v United States of America", 
             " Democratic Republic of the Congo v Belgium", " Liechtenstein v Germany", 
             " Democratic Republic of the Congo v Ugandan", " Democratic Republic of the Congo v Rwandan", 
             " Nicaragua v Colombia", " Djibouti v France", " Georgia v Russian Federation", 
             " Croatia v Serbia", " Mexico v United States of American", " Democratic Republic of the Congo v Rwanda", 
             " Spain v  Canada", " Australia v  France", " New Zealand v France", 
             " New Zealand v France")), .Names = c("v1", "v2"
             ), row.names = c(NA, 30L), class = "data.frame")
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-03-04 10:12:06

grepl可用于将v2的单个值与可能的v1子字符串进行比较。

您需要对每一行分别应用它,因此一个快速解决方案可以是:apply(data.frame(v1,v2),MARGIN=1, FUN=function(x) {grepl(x[2],x[1])})

如果您想忽略空格数量上的差异(如行#1),可以使用gsub将x2中的值替换为适当的regex,因此" "将被" *"替换,以允许多个空格。

在这种情况下,这一应用程序将有效:

apply(x,MARGIN=1, FUN=function(x) {grepl(gsub(" "," *",x[2]),x[1])})

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/42594218

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档