文章/答案/技术大牛

发布

社区首页 >问答首页 >创建一个fuzzyjoin并仅在存在时保留精确匹配，否则保留所有选项

问创建一个fuzzyjoin并仅在存在时保留精确匹配，否则保留所有选项
EN

Stack Overflow用户

提问于 2020-08-06 18:35:15

回答 1查看 85关注 0票数 0

我有两个数据帧，我正在尝试基于国家名称字段加入，我希望实现以下目标:当找到完美匹配时，我希望只保留该行，否则我希望显示所有行/选项。

library(fuzzyjoin)

df1 <- data.frame(
  country = c('Germany','Germany and Spain','Italy','Norway and Sweden','Austria','Spain'),
  score = c(7,8,9,10,11,12)
)

df2 <- data.frame(
  country_name = c('Germany and Spain','Germany','Germany.','Germania','Deutschland','Germany - ','Spun','Spain and Portugal','Italy','Italia','Greece and Italy',
                   'Australia','Austria...','Norway (Scandinavia)','Norway','Sweden'),
  comments = c('xxx','rrr','ttt','hhhh','gggg','jjjj','uuuuu','ooooo','yyyyyyyyyy','bbbbb','llllll','wwwwwww','nnnnnnn','cc','mmmm','lllll')
)

j <- regex_left_join(df1,df2, by = c('country' = 'country_name'), ignore_case = T)

结果(j)显示“德国和西班牙”出现了3次，第一次出现是完美匹配，我只想保留这一次，去掉另外两次。‘挪威和瑞典’没有完美的匹配，所以我想保留两个可能的选项/行(原样)。

我该怎么做呢？

join

match

fuzzyjoin

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-08-06 20:08:29

您可以使用stringdist::stringdist来计算匹配之间的距离，对于存在精确匹配的条目，仅保留该距离：

library(dplyr)
j %>% 
  mutate(dist = stringdist::stringdist(country, country_name)) %>% # add distance
  group_by(country) %>%                                            # group entries
  mutate(exact = any(dist == 0)) %>%                               # check if exact match exists in group
  filter(!exact | dist == 0) %>%                                   # keep only entries where no exact match exists in the group OR where the entry is the exact match
  ungroup()
#> # A tibble: 5 x 6
#>   country           score country_name      comments    dist exact
#>   <chr>             <dbl> <chr>             <chr>      <dbl> <lgl>
#> 1 Germany               7 Germany           rrr            0 TRUE 
#> 2 Germany and Spain     8 Germany and Spain xxx            0 TRUE 
#> 3 Italy                 9 Italy             yyyyyyyyyy     0 TRUE 
#> 4 Norway and Sweden    10 Norway            mmmm          11 FALSE
#> 5 Norway and Sweden    10 Sweden            lllll         11 FALSE

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63281742

复制

相似问题

问创建一个fuzzyjoin并仅在存在时保留精确匹配，否则保留所有选项
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问创建一个fuzzyjoin并仅在存在时保留精确匹配，否则保留所有选项EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问创建一个fuzzyjoin并仅在存在时保留精确匹配，否则保留所有选项
EN