我正面临着一个应该使用Aho-Corasick自动机来解决的问题。我得到了一组单词(由'0‘或’1‘组成)-模式,我必须决定是否可以创建无限的文本,其中不包含任何给定的模式。我认为,解决方案是创建Aho-Corasick自动机并搜索没有匹配状态的循环,但我不能提出一个好的方法来做到这一点。我想过使用DFS搜索状态图,但我不确定它是否可以工作,并且我有一个实现问题--让我们假设我们在一个状态中,它有一个'1‘边-但是那个边指向的状态被标记为匹配-所以我们不能使用那个边,我们可以尝试失败链接(当前状态没有'0’边)-但我们也必须记住,我们不能使用当前状态的失败链接指向的状态的'1‘边。
有没有人能纠正我,告诉我怎么做?我已经用C++编写了Aho-Corasick,我确信它可以工作--我也理解整个算法。
下面是基本代码:
class AhoCorasick
{
static const int ALPHABET_SIZE = 2;
struct State
{
State* edge[ALPHABET_SIZE];
State* fail;
State* longestMatchingSuffix;
//Vector used to remember which pattern matches in this state.
vector< int > matching;
short color;
State()
{
for(int i = 0; i < ALPHABET_SIZE; ++i)
edge[i] = 0;
color = 0;
}
~State()
{
for(int i = 0; i < ALPHABET_SIZE; ++i)
{
delete edge[i];
}
}
};
private:
State root;
vector< int > lenOfPattern;
bool isFailComputed;
//Helper function used to traverse state graph.
State* move(State* curr, char letter)
{
while(curr != &root && curr->edge[letter] == 0)
{
curr = curr->fail;
}
if(curr->edge[letter] != 0)
curr = curr->edge[letter];
return curr;
}
//Function which computes fail links and longestMatchingSuffix.
void computeFailLink()
{
queue< State* > Q;
root.fail = root.longestMatchingSuffix = 0;
for(int i = 0; i < ALPHABET_SIZE; ++i)
{
if(root.edge[i] != 0)
{
Q.push(root.edge[i]);
root.edge[i]->fail = &root;
}
}
while(!Q.empty())
{
State* curr = Q.front();
Q.pop();
if(!curr->fail->matching.empty())
{
curr->longestMatchingSuffix = curr->fail;
}
else
{
curr->longestMatchingSuffix = curr->fail->longestMatchingSuffix;
}
for(int i = 0; i < ALPHABET_SIZE; ++i)
{
if(curr->edge[i] != 0)
{
Q.push(curr->edge[i]);
State* state = curr->fail;
state = move(state, i);
curr->edge[i]->fail = state;
}
}
}
isFailComputed = true;
}
public:
AhoCorasick()
{
isFailComputed = false;
}
//Add pattern to automaton.
//pattern - pointer to pattern, which will be added
//fun - function which will be used to transform character to 0-based index.
void addPattern(const char* const pattern, int (*fun) (const char *))
{
isFailComputed = false;
int len = strlen(pattern);
State* curr = &root;
const char* pat = pattern;
for(; *pat; ++pat)
{
char tmpPat = fun(pat);
if(curr->edge[tmpPat] == 0)
{
curr = curr->edge[tmpPat] = new State;
}
else
{
curr = curr->edge[tmpPat];
}
}
lenOfPattern.push_back(len);
curr->matching.push_back(lenOfPattern.size() - 1);
}
};
int alphabet01(const char * c)
{
return *c - '0';
}发布于 2013-03-27 01:56:14
我没有看过你的代码,但我知道非常简单和有效的实现。
首先,让我们向树中添加Dictionary后缀链接(您可以在Wikipedia中找到它们的描述)。然后,您必须遍历所有的树,并以某种方式将匹配节点和具有Dict后缀链接的节点标记为坏节点。这些操作的解释很明显:您不需要所有匹配的节点,也不需要其中有匹配后缀的节点。
现在,您有了一个没有任何匹配节点的Aho-Corasick树。如果你只是在结果树上运行DFS algo,你会得到你想要的。
https://stackoverflow.com/questions/15189025
复制相似问题