我正在尝试实现一组函数,为我为个人项目编码的一些搜索功能直接在线性时间内创建一个DAWG。我读过this paper,它恰好详细介绍了DAWG背后的想法,甚至提供了在线性时间内构建的伪代码!
然而,遵循伪代码似乎(在我看来)产生了一个类似trie的结构。具体地说,后缀似乎不是显式共享的(实际上是由图中的边连接的)。相反,它们是由后缀指针表示的,这些后缀指针与图形的实际遍历没有真正的关系。
例如,看一下集合{tap, taps, top, tops} (来自DAWG Wikipedia page)中单词的DAWG图片:

现在,将其与上述论文中详细介绍的步骤得到的结构进行比较(使用这组单词手动完成只需很少的时间):
Note: Edges are labeled by letters
Nodes are labeled by the concatenation of the labels of the primary edges
used to reach them
Suffix pointers are not visually represented on the graph
primary edges: solid edges used to traverse graph
secondary edges: dotted edges implying a suffix relationship between
the letter labeling the edge and the substring
represented by the target node
builddawg(S)
1. Create a node named source.
2. Let activenode be source.
3. For each word w of S do:
A. For each letter 'a' of w do:
Let activenode be update (activenode, a).
B. Let activenode be source.
4. Return source.
update (activenode, a)
1. If activenode has an outgoing edge labeled 'a', then
A. Let newactivenode be the node that this edge leads to.
B. If this edge is primary, return newactivenode.
C. Else, return split (activenode, newactivenode).
2. Else
A. Create a node named newactivenode.
B. Create a primary edge labeled 'a' from activenode to newactivenode.
C. Let currentnode be activenode.
D. Let suflxnode be undefined.
E. While currentnode isn’t source and sufixnode is undefined do:
i. Let currentnode be the node pointed to by the suffix
pointer of currentnode.
ii. If currentnode has a primary outgoing edge labeled 'a',
then let sufixnode be the node that this edge leads to.
iii. Else,if currentnode has a secondary outgoing edge labeled 'a' then
a. Let childnode be the node that this edge leads to.
b. Let suffixnode be split (currentnode, childnode).
iv. Else, create a secondary edge from currentnode to newactivenode
labeled 'a'.
F. If sufixnode is still undefined, let suffixnode be source.
G. Set the suffix pointer of newactivenode to point to sufixnode.
H. Return newactivenode.
split (parentnode, childnode)
1. Create a node called newchildnode.
2. Make the secondary edge from parentnode to childnode into
a primary edge from parentnode to newchildnode (with the same label).
3. For every primary and secondary outgoing edge of childnode,
create a secondary outgoing edge of newchildnode with the
same label and leading to the same node.
4. Set the suffix pointer of newchildnode equal to that of childnode.
5. Reset the suffix pointer of childnode to point to newchildnode.
6. Let currentnode be parentnode.
7. While currentnode isn’t source do:
A. Let currentnode be the node pointed to by the
suffix pointer of currentnode.
B. If currentnode has a secondary edge to childnode,
then make it a secondary edge to newchildnode (with the same label).
C. Else, break out of the while loop.
8. Return newchildnode.我得到的结构不等同于上面的图片。事实上,它看起来与trie几乎相同,除了将次边转换为主边所产生的额外节点。与上面的DAWG等价的trie是:

我只是应用了错误的算法,还是有几种类型的狗,或者我只是误解了狗应该是什么样子?
我看过的大多数详细描述DAWGs的论文都有似乎是由算法创建的结构,但我在网上读到的大多数材料(以及我看到的图片)都有连接常见后缀的实际边。我不知道该相信什么,或者他们是否真的是等价的。
发布于 2012-07-24 04:24:22
我相信我找到了一个解决方案。
在构建suffixPointer之后,您可以从上到下遍历节点,并删除具有suffixPointer != source的节点的子树,将它们直接连接到DAWG所指向的节点。
https://stackoverflow.com/questions/11617325
复制相似问题