首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >按照以下步骤构建DAWG会产生Trie。为什么?

按照以下步骤构建DAWG会产生Trie。为什么?
EN

Stack Overflow用户
提问于 2012-07-24 01:19:49
回答 1查看 461关注 0票数 1

我正在尝试实现一组函数,为我为个人项目编码的一些搜索功能直接在线性时间内创建一个DAWG。我读过this paper,它恰好详细介绍了DAWG背后的想法,甚至提供了在线性时间内构建的伪代码!

然而,遵循伪代码似乎(在我看来)产生了一个类似trie的结构。具体地说,后缀似乎不是显式共享的(实际上是由图中的边连接的)。相反,它们是由后缀指针表示的,这些后缀指针与图形的实际遍历没有真正的关系。

例如,看一下集合{tap, taps, top, tops} (来自DAWG Wikipedia page)中单词的DAWG图片:

现在,将其与上述论文中详细介绍的步骤得到的结构进行比较(使用这组单词手动完成只需很少的时间):

代码语言:javascript
复制
Note: Edges are labeled by letters
      Nodes are labeled by the concatenation of the labels of the primary edges
      used to reach them
      Suffix pointers are not visually represented on the graph

      primary edges:   solid edges used to traverse graph
      secondary edges: dotted edges implying a suffix relationship between
                       the letter labeling the edge and the substring 
                       represented by the target node

builddawg(S)
    1. Create a node named source.
    2. Let activenode be source.
    3. For each word w of S do:
        A. For each letter 'a' of w do:
            Let activenode be update (activenode, a).
        B. Let activenode be source.
    4. Return source.


update (activenode, a)
    1. If activenode has an outgoing edge labeled 'a', then
        A. Let newactivenode be the node that this edge leads to.
        B. If this edge is primary, return newactivenode.
        C. Else, return split (activenode, newactivenode).
    2. Else
        A. Create a node named newactivenode.
        B. Create a primary edge labeled 'a' from activenode to newactivenode.
        C. Let currentnode be activenode.
        D. Let suflxnode be undefined.
        E. While currentnode isn’t source and sufixnode is undefined do:
            i. Let currentnode be the node pointed to by the suffix
               pointer of currentnode.
            ii. If currentnode has a primary outgoing edge labeled 'a',
                then let sufixnode be the node that this edge leads to.
            iii. Else,if currentnode has a secondary outgoing edge labeled 'a' then
                a. Let childnode be the node that this edge leads to.
                b. Let suffixnode be split (currentnode, childnode).
            iv. Else, create a secondary edge from currentnode to newactivenode
                labeled 'a'.
        F. If sufixnode is still undefined, let suffixnode be source.
        G. Set the suffix pointer of newactivenode to point to sufixnode.
        H. Return newactivenode.


split (parentnode, childnode)
    1. Create a node called newchildnode.
    2. Make the secondary edge from parentnode to childnode into
       a primary edge from parentnode to newchildnode (with the same label).
    3. For every primary and secondary outgoing edge of childnode,
       create a secondary outgoing edge of newchildnode with the
       same label and leading to the same node.
    4. Set the suffix pointer of newchildnode equal to that of childnode.
    5. Reset the suffix pointer of childnode to point to newchildnode.
    6. Let currentnode be parentnode.
    7. While currentnode isn’t source do:
        A. Let currentnode be the node pointed to by the 
           suffix pointer of currentnode.
        B. If currentnode has a secondary edge to childnode,
           then make it a secondary edge to newchildnode (with the same label).
        C. Else, break out of the while loop.
    8. Return newchildnode.

我得到的结构不等同于上面的图片。事实上,它看起来与trie几乎相同,除了将次边转换为主边所产生的额外节点。与上面的DAWG等价的trie是:

我只是应用了错误的算法,还是有几种类型的狗,或者我只是误解了狗应该是什么样子?

我看过的大多数详细描述DAWGs的论文都有似乎是由算法创建的结构,但我在网上读到的大多数材料(以及我看到的图片)都有连接常见后缀的实际边。我不知道该相信什么,或者他们是否真的是等价的。

EN

回答 1

Stack Overflow用户

发布于 2012-07-24 04:24:22

我相信我找到了一个解决方案。

在构建suffixPointer之后,您可以从上到下遍历节点,并删除具有suffixPointer != source的节点的子树,将它们直接连接到DAWG所指向的节点。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/11617325

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档