首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >加权Damerau-Levenshtein在VBA中的应用

加权Damerau-Levenshtein在VBA中的应用
EN

Stack Overflow用户
提问于 2012-12-03 22:47:11
回答 1查看 10.1K关注 0票数 11

我正在为Microsoft套件构建一个私有拼写检查程序。我正在对排字和它们的潜在修正进行字符串比较,以确定我想要包含哪些更正。

我一直在寻找一个加权的 Damerau-Levenshtein字符串比较公式,因为我希望掉期、插入、删除和替换都有不同的权重,而不仅仅是"1“的权重,所以我可以优先考虑一些修正。例如,错误的"agmes“理论上可以更正为”游戏“或”年龄“,因为两者都只需要一次编辑就可以移动到拼写正确的单词,但我想给”交换“编辑一个较低的权重,以便”游戏“显示为首选的更正。

我使用Excel进行分析,所以我使用的任何代码都需要在(VBA)中。我能找到的最好的是这个例子,它看起来很棒,但它是用Java实现的。我尽了最大的努力去皈依,但我不是一个专家,我需要一点帮助!

有人能看一下附加的代码并帮我找出问题所在吗?

谢谢!

编辑:我让它自己工作。这是VBA中的加权Damerau-Levenshtein公式。它使用Excel内置的数学函数进行一些评估。当将一个错误与两个可能的更正进行比较时,代价最高的更正是首选词。这是因为两个交换的成本必须大于删除和插入的成本,如果以最低的成本分配掉期(我认为这是理想的),这是不可能的。如果你需要更多的信息,可以查看凯文的博客。

代码语言:javascript
复制
Public Function WeightedDL(source As String, target As String) As Double

    Dim deleteCost As Double
    Dim insertCost As Double
    Dim replaceCost As Double
    Dim swapCost As Double

    deleteCost = 1
    insertCost = 1.1
    replaceCost = 1.1
    swapCost = 1.2

    Dim i As Integer
    Dim j As Integer
    Dim k As Integer

    If Len(source) = 0 Then
        WeightedDL = Len(target) * insertCost
        Exit Function
    End If

    If Len(target) = 0 Then
        WeightedDL = Len(source) * deleteCost
        Exit Function
    End If

    Dim table() As Double
    ReDim table(Len(source), Len(target))

    Dim sourceIndexByCharacter() As Variant
    ReDim sourceIndexByCharacter(0 To 1, 0 To Len(source) - 1) As Variant

    If Left(source, 1) <> Left(target, 1) Then
        table(0, 0) = Application.Min(replaceCost, (deleteCost + insertCost))
    End If

    sourceIndexByCharacter(0, 0) = Left(source, 1)
    sourceIndexByCharacter(1, 0) = 0

    Dim deleteDistance As Double
    Dim insertDistance As Double
    Dim matchDistance As Double

    For i = 1 To Len(source) - 1

        deleteDistance = table(i - 1, 0) + deleteCost
        insertDistance = ((i + 1) * deleteCost) + insertCost

        If Mid(source, i + 1, 1) = Left(target, 1) Then
            matchDistance = (i * deleteCost) + 0
        Else
            matchDistance = (i * deleteCost) + replaceCost
        End If

        table(i, 0) = Application.Min(Application.Min(deleteDistance, insertDistance), matchDistance)
    Next

    For j = 1 To Len(target) - 1

        deleteDistance = table(0, j - 1) + insertCost
        insertDistance = ((j + 1) * insertCost) + deleteCost

        If Left(source, 1) = Mid(target, j + 1, 1) Then
            matchDistance = (j * insertCost) + 0
        Else
            matchDistance = (j * insertCost) + replaceCost
        End If

        table(0, j) = Application.Min(Application.Min(deleteDistance, insertDistance), matchDistance)
    Next

    For i = 1 To Len(source) - 1

        Dim maxSourceLetterMatchIndex As Integer

        If Mid(source, i + 1, 1) = Left(target, 1) Then
            maxSourceLetterMatchIndex = 0
        Else
            maxSourceLetterMatchIndex = -1
        End If

        For j = 1 To Len(target) - 1

            Dim candidateSwapIndex As Integer
            candidateSwapIndex = -1

            For k = 0 To UBound(sourceIndexByCharacter, 2)
                If sourceIndexByCharacter(0, k) = Mid(target, j + 1, 1) Then candidateSwapIndex = sourceIndexByCharacter(1, k)
            Next

            Dim jSwap As Integer
            jSwap = maxSourceLetterMatchIndex

            deleteDistance = table(i - 1, j) + deleteCost
            insertDistance = table(i, j - 1) + insertCost
            matchDistance = table(i - 1, j - 1)

            If Mid(source, i + 1, 1) <> Mid(target, j + 1, 1) Then
                matchDistance = matchDistance + replaceCost
            Else
                maxSourceLetterMatchIndex = j
            End If

            Dim swapDistance As Double

            If candidateSwapIndex <> -1 And jSwap <> -1 Then

                Dim iSwap As Integer
                iSwap = candidateSwapIndex

                Dim preSwapCost
                If iSwap = 0 And jSwap = 0 Then
                    preSwapCost = 0
                Else
                    preSwapCost = table(Application.Max(0, iSwap - 1), Application.Max(0, jSwap - 1))
                End If

                swapDistance = preSwapCost + ((i - iSwap - 1) * deleteCost) + ((j - jSwap - 1) * insertCost) + swapCost

            Else
                swapDistance = 500
            End If

            table(i, j) = Application.Min(Application.Min(Application.Min(deleteDistance, insertDistance), matchDistance), swapDistance)

        Next

        sourceIndexByCharacter(0, i) = Mid(source, i + 1, 1)
        sourceIndexByCharacter(1, i) = i

    Next

    WeightedDL = table(Len(source) - 1, Len(target) - 1)

End Function
EN

回答 1

Stack Overflow用户

发布于 2015-03-30 10:57:55

相信这几行是错误的:-

代码语言:javascript
复制
deleteDistance = table(0, j - 1) + insertCost
insertDistance = ((j + 1) * insertCost) + deleteCost

应该是:-

代码语言:javascript
复制
deleteDistance = ((j + 1) * insertCost) + deleteCost
insertDistance = table(0, j - 1) + insertCost

还没有通过代码计算出正在发生的事情,但是下面是奇怪的!

代码语言:javascript
复制
If Left(source, 1) <> Left(target, 1) Then
    table(0, 0) = Application.Min(replaceCost, (deleteCost + insertCost))
End If

由于您需要替换、删除或插入它,可能应该是:-

代码语言:javascript
复制
If Left(source, 1) <> Left(target, 1) Then
    table(0, 0) = Application.Min(replaceCost, Application.Min(deleteCost, insertCost))
End If
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/13693149

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档