我有一个包含20,000行的数据集,其最纯粹的形式如下所示:
v1 v2
1 Case 1 (A v. B) A v. B
2 Case 2 (A v. C) A v. B
3 Case 2 (A v. C) C v. B
4 Case 4 (X v. Z) X v. Z
5 Case 5 (B v. A) A v. B
6 Case 6 (X v. A) X v. A
7 Case 6 (X v. A) A v. X
......except有许多v1,v2的变种(实际上大约在150个左右,但仍有太多的变体无法列出)。
我希望返回第三列v3,其中包含一个逻辑指示符,说明v1的任何子字符串是否与v2中的字符串匹配。
v1 v2 v3
1 Case 1 (A v. B) A v. B TRUE
2 Case 2 (A v. C) A v. B FALSE
3 Case 2 (A v. C) C v. B FALSE
4 Case 4 (X v. Z) X v. Z TRUE
5 Case 5 (B v. A) A v. B FALSE
6 Case 6 (X v. A) X v. A TRUE
7 Case 6 (X v. A) A v. X FALSE我一直在玩这样的事情,我认为这是正确的:
library(stringr)
x$v3 <- with(x, str_detect(v1, v2))如果有人能为我指出正确的解决方案/解决办法,我将非常感激。
MWE显示了我的str_detect()技术不起作用:
x <- structure(list(v1 = c("Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation",
"Application of the International Convention on the Elimination of All Forms of Racial Discrimination Georgia v Russian Federation"
), v2 = c("Georgia v Russian Federation", " Ethiopia v South Africa Liberia v South Africa",
" Cameroon v United Kingdom", " New Zealand v France", " Australia v France",
" Nicaragua v United States of America", " Nicaragua v Honduras",
" Nauru v Anustralia", " Nnew Zealand v France", " Islamic Republic of Iran v United States of America",
" Bosnia and Herzegovina v Serbia and Montenegro", " Spain v Cananda",
" Libyan Arab Jamahiriya v United States of America", " Libyan Arab Jamahiriya v United Kingdom",
" Democratic Republic of the Congo v Burundi", " Germany v United States of America",
" Democratic Republic of the Congo v Belgium", " Liechtenstein v Germany",
" Democratic Republic of the Congo v Ugandan", " Democratic Republic of the Congo v Rwandan",
" Nicaragua v Colombia", " Djibouti v France", " Georgia v Russian Federation",
" Croatia v Serbia", " Mexico v United States of American", " Democratic Republic of the Congo v Rwanda",
" Spain v Canada", " Australia v France", " New Zealand v France",
" New Zealand v France")), .Names = c("v1", "v2"
), row.names = c(NA, 30L), class = "data.frame")发布于 2017-03-04 10:12:06
grepl可用于将v2的单个值与可能的v1子字符串进行比较。
您需要对每一行分别应用它,因此一个快速解决方案可以是:apply(data.frame(v1,v2),MARGIN=1, FUN=function(x) {grepl(x[2],x[1])})
如果您想忽略空格数量上的差异(如行#1),可以使用gsub将x2中的值替换为适当的regex,因此" "将被" *"替换,以允许多个空格。
在这种情况下,这一应用程序将有效:
apply(x,MARGIN=1, FUN=function(x) {grepl(gsub(" "," *",x[2]),x[1])})
https://stackoverflow.com/questions/42594218
复制相似问题