文章/答案/技术大牛

发布

社区首页 >问答首页 >使用stringdist查找多个单词的匹配项

问使用stringdist查找多个单词的匹配项
EN

Stack Overflow用户

提问于 2021-11-03 13:13:36

回答 1查看 41关注 0票数 1

我有如下的测试数据。我正在尝试查找(接近)匹配的单词向量，使用stringdist，因为实际的数据库很大：

library(stringdist)
test_data <- structure(list(Province = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3), Year = c(2000, 
2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002, 2000, 2000, 2000, 
2001, 2001, 2001, 2002, 2002, 2002, 2000, 2000, 2000, 2001, 2001, 
2001, 2002, 2002, 2002), Municipality = c("Some", "Anything", 
"Nothing", "Someth.", "Anything", "Not", "Something", "Anything", 
"None", "Some", "Anything", "Nothing", "Someth.", "Anything", 
"Not", "Something", "Anything", "None", "Some", "Anything", "Nothing", 
"Someth.", "Anything", "Not", "Something", "Anything", "None"
), `Other Values` = c(0.41, 0.42, 0.34, 0.47, 0.0600000000000001, 
0.8, 0.14, 0.15, 0.01, 0.41, 0.42, 0.34, 0.47, 0.0600000000000001, 
0.8, 0.14, 0.15, 0.01, 0.41, 0.42, 0.34, 0.47, 0.0600000000000001, 
0.8, 0.14, 0.15, 0.01)), row.names = c(NA, -27L), class = c("tbl_df", 
"tbl", "data.frame"))

# A tibble: 27 x 4
   Province  Year Municipality `Other Values`
      <dbl> <dbl> <chr>                 <dbl>
 1        1  2000 Some                 0.41  
 2        1  2000 Anything             0.42  
 3        1  2000 Nothing              0.34  
 4        1  2001 Someth.              0.47  
 5        1  2001 Anything             0.0600
 6        1  2001 Not                  0.8   
 7        1  2002 Something            0.14  
 8        1  2002 Anything             0.15  
 9        1  2002 None                 0.01  
10        2  2000 Some                 0.41  
# ... with 17 more rows

我试着跑：

test_match_out <- amatch(c("Anything","Something"),test_data[,3],maxDist=2)

编辑：

根据zx8754的评论，我尝试了：

test_match_out <- amatch(c("Anything","Something"),test_data[[3]],maxDist=2)

和：

test_match_out <- amatch(c("Anything","Something"),test_data$Municipality,maxDist=2)

我的印象是前一行(amatch)会给我一个类似于索引向量的东西，其中会有一个匹配。但它只给了我一个有两个NA值的向量。是我误解了amatch的功能，还是语法有问题？

我想要获取匹配amatch的values和匹配的单词。

所需输出：

test_data_2 <- structure(list(Province = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3), Year = c(2000, 
2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002, 2000, 2000, 2000, 
2001, 2001, 2001, 2002, 2002, 2002, 2000, 2000, 2000, 2001, 2001, 
2001, 2002, 2002, 2002), Municipality = c("Some", "Anything", 
"Nothing", "Someth.", "Anything", "Not", "Something", "Anything", 
"None", "Some", "Anything", "Nothing", "Someth.", "Anything", 
"Not", "Something", "Anything", "None", "Some", "Anything", "Nothing", 
"Someth.", "Anything", "Not", "Something", "Anything", "None"
), `Other Values` = c(0.41, 0.42, 0.34, 0.47, 0.0600000000000001, 
0.8, 0.14, 0.15, 0.01, 0.41, 0.42, 0.34, 0.47, 0.0600000000000001, 
0.8, 0.14, 0.15, 0.01, 0.41, 0.42, 0.34, 0.47, 0.0600000000000001, 
0.8, 0.14, 0.15, 0.01), `Matched Values` = c(NA, 0.42, NA, NA, 0.06000, 
NA, 0.14, 0.15, NA, NA, 0.42, NA, NA, 0.0600000000000001, 
NA, 0.14, 0.15, NA, NA, 0.42, NA, NA, 0.0600000000000001, 
NA, 0.14, 0.15, NA), `Matched Values` = c(NA, "Anything", NA, NA, "Anything", 
NA, "Something", "Anything", NA, NA, "Anything", NA, NA, "Anything", 
NA, "Something", "Anything", NA, NA, "Anything", NA, NA, "Anything", 
NA, "Something", "Anything", NA)), row.names = c(NA, -27L), class = c("tbl_df", 
"tbl", "data.frame"))

stringdist

string

fuzzy-search

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-11-03 13:58:43

获取匹配的索引，然后更新所有匹配的行：

ix <- amatch(c("Anything","Something"), test_data[[ 3 ]], maxDist = 2)
# [1] 2 7

ifelse(test_data$Municipality %in% test_data$Municipality[ ix ], 
       test_data$`Other Values`, NA)
#  [1]   NA 0.42   NA   NA 0.06   NA 0.14 0.15   NA   NA 0.42
# [12]   NA   NA 0.06   NA 0.14 0.15   NA   NA 0.42   NA   NA
# [23] 0.06   NA 0.14 0.15   NA

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69825488

复制

相似问题

问使用stringdist查找多个单词的匹配项
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用stringdist查找多个单词的匹配项EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用stringdist查找多个单词的匹配项
EN