两个字符串"abc“和"acb”之间的Damerau-Levenshtein距离将是1,因为它涉及"b“和"c”之间的一个转置。
> stringdist("abc", "acb", method = "dl")
[1] 1现在假设我有以下两个字符向量:
A = c("apple", "banana", "citrus")
B = c("apple", "citrus", "banana")我如何计算A和B之间的Damerau-Levenshtein距离,使结果与"abc“和"acb”之间的距离相同,因为"citrus“和"banana”之间有一个换位?换句话说,我如何计算A和B之间的Damerau-Levenshtein距离,以便每一项都被计算为字符串中的一个字符?
发布于 2021-03-31 17:03:14
library(stringdist)
library(tidyr)
A = c("apple", "banana", "citrus")
B = c("apple", "citrus", "banana")
a <- factor(A, levels = union(A,B)) %>%
as.numeric() %>%
sapply(function(i) letters[i]
%>% paste0(collapse = "")
) %>%
paste0(collapse = "")
b <- factor(B, levels = union(A,B)) %>%
as.numeric() %>%
sapply(function(i) letters[i]
%>% paste0(collapse = "")
) %>%
paste0(collapse = "")
stringdist(a, b, method = "dl")发布于 2021-03-31 16:20:35
关于
vecdist <- function(x, y){
matches <- match(x, y, nomatch = 0)
nomatch <- matches == 0
# No match = we need 1 permutation
# Other matches: Compare index, for each "not inverted" index, (not 3 vs -3) we need 1 permutation
perm_match <- (matches - seq_along(matches))[!nomatch]
perm_n <- sum(perm_match != 0) - sum(duplicated(abs(perm_match)))
sum(nomatch) + perm_n + sum(!y %in% x)
}这里的基本思想是:
x vs y中是否缺少匹配项,反之亦然。每一个都是1 permutationduplicated(abs(...))进行“相互”切换。例如abcd,badc是2个排列,而abcd,bdca是3。这与stringdist对单个字符串的工作方式非常相似。
A = c("apple", "banana", "citrus")
B = c("apple", "citrus", "banana")
vecdist(A, B)
[1] 1
A <- c(A, 'pear')
vecdist(A, B)
[1] 2
vecdist(B, A)
[1] 2
A <- c('apple', 'banana', 'citrus', 'pear')
B <- c('pear', 'citrus', 'banana', 'apple')
vecdist(A, B)
[1] 2
vecdist(B, A)
[1] 2
A <- c('apple', 'banana', 'citrus', 'pear')
B <- c('pear', 'citrus', 'apple', 'banana')
vecdist(A, B)
[1] 3
vecdist(B, A)
[1] 3https://stackoverflow.com/questions/66882695
复制相似问题