文章/答案/技术大牛

发布

社区首页 >问答首页 >如何仅使用c++标准化库构建句子解析器？

问如何仅使用c++标准化库构建句子解析器？
EN

Stack Overflow用户

提问于 2010-04-13 04:59:35

回答 5查看 4.4K关注 0票数 1

我正在设计一个类似于Zork的基于文本的游戏，我希望它能够解析一个森坦斯，并抽出关键字，如采取，下降等。问题是，我想通过标准的c++库来做这件事……我听说过一些外部库(比如flex/bison)可以有效地实现这一点；但是，我现在还不想搞乱这些库。

我正在考虑实现的是一个基于令牌的系统，它有一个单词列表，解析器可以识别这些单词，即使它们在一个句子中，比如“拿剑杀怪物”，并且知道根据解析器的语法规则，拿，剑，杀和怪物都被识别为令牌，并将产生输出“怪物杀死”或类似的东西。我听说在c++标准库中有一个叫做strtok的函数可以做到这一点，但是我也听说它是“不安全的”。因此，如果在座的任何人能伸出援手，我将不胜感激。

c++

parsing

回答 5

Stack Overflow用户

回答已采纳

发布于 2010-04-13 11:59:03

对于使用std::string、std::set容器和此tokenization function (Alavoor Vasudevan)的naive实现，您可以执行以下操作：

#include <iostream>
#include <set>
#include <string>

int main()
{
 /*You match the substring find in the while loop (tokenization) to 
 the ones contained in the dic(tionnary) set. If there's a match, 
 the substring is printed to the console.
 */

    std::set<std::string> dic;
    dic.insert("sword");
    dic.insert("kill");
    dic.insert("monster");

    std::string str = "take sword and kill monster";
    std::string delimiters = " ";    

    std::string::size_type lastPos = str.find_first_not_of(delimiters, 0);
    std::string::size_type pos = str.find_first_of(delimiters, lastPos);

    while (std::string::npos != pos || std::string::npos != lastPos)
    {
        if(dic.find(str.substr(lastPos, pos - lastPos)) != dic.end())
            std::cout << str.substr(lastPos, pos - lastPos) 
                    << " is part of the dic.\n";
        lastPos = str.find_first_not_of(delimiters, pos);
        pos = str.find_first_of(delimiters, lastPos);
    }

    return 0;
}

这将输出：

剑是dic的一部分。

杀死是dic的一部分。

怪物是dic的一部分。

备注：

对于自然的词集来说，标记化分隔符(空白)非常(太)简单可以使用boost中的一些实用程序( languages.

You，tokenizer).

If你的字典(单词列表)真的很大使用散列版本的(unordered_set).

设置可能很有用

使用boost记号赋予器，它可能看起来像这样(这可能不是很有效)：

boost::tokenizer<> tok(str);
BOOST_FOREACH(const std::string& word,tok)
{
    if(dic.find(word) != dic.end())
        std::cout << word << " is part of the dic.\n";
}

票数 0

Stack Overflow用户

发布于 2010-04-13 05:05:36

strtok函数来自C标准库，它有一些问题。例如，它在适当的位置修改字符串，并可能由于缓冲区溢出而导致安全问题。相反，您应该考虑使用C++标准库中的IOStream类以及Standard Template Library (STL)容器和算法。

示例：

#include <algorithm>
#include <cctype>
#include <iostream>
#include <sstream>

using namespace std;

int
main()
{
    string line;

    // grab a line from standard input
    while (getline(cin, line)) {

        // break the input in to tokens using a space as the delimeter
        istringstream stream(line);
        string token;
        while (getline(stream, token, ' ')) {

            // convert string to all caps
            transform(token.begin(), token.end(), token.begin(), (int(*)(int)) toupper);

            // print each token on a separate line
            cout << token << endl;
        }
    }
}

票数 3

Stack Overflow用户

发布于 2010-04-13 10:46:36

根据这种语言的解析复杂程度，您可以使用C++ Technical Report 1的正则表达式库。

如果这还不够强大，那么stringstreams可能会让你有所作为，但是过了一段时间，你可能会决定像Flex/Bison这样的解析器生成器是表达语法的最简洁的方式。

你需要根据你正在分析的句子的复杂程度来选择你的工具。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/2625411

复制

相似问题

问如何仅使用c++标准化库构建句子解析器？
EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何仅使用c++标准化库构建句子解析器？EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何仅使用c++标准化库构建句子解析器？
EN