文章/答案/技术大牛

发布

社区首页 >问答首页 >使用libclang进行解析:当基类型未知时获得CXX_BASE_SPECIFIER游标

问使用libclang进行解析:当基类型未知时获得CXX_BASE_SPECIFIER游标
EN

Stack Overflow用户

提问于 2013-12-10 03:32:58

回答 1查看 2.6K关注 0票数 3

我正在编写一个文档生成器，使包含路径正确是非常糟糕的，所以在解析文件时，只需跳过所有包含的。我还手动调优了所有有问题的定义或#ifdef块，这些块由于缺少包含(以及不同的命令行和生产构建)而被跳过。

我注意到的问题是：

struct ComplexBuffer : IAnimatable
{
};

使用IAnimatable没有声明(或前向声明)。

我使用的是clang.cindex的python绑定，所以我在迭代中使用get_children :结果如下：

Found grammar element "IAnimatable" {CursorKind.CLASS_DECL} [line=37, col=8]
Found grammar element "ComplexBuffer" {CursorKind.STRUCT_DECL} [line=39, col=9]

如果我完成基本类型：

class IAnimatable {};

struct ComplexBuffer : IAnimatable

我得到了正确的输出：

Found grammar element "IAnimatable" {CursorKind.CLASS_DECL} [line=37, col=8]
Found grammar element "ComplexBuffer" {CursorKind.STRUCT_DECL} [line=39, col=9]
Found grammar element "class IAnimatable" {CursorKind.CXX_BASE_SPECIFIER} [line=39, col=25]
Found grammar element "class IAnimatable" {CursorKind.TYPE_REF} [line=39, col=25]

这正是我想要的，因为我可以检测到要放在文档中的继承列表。

这个问题之所以出现，是因为我跳过了所有的内容。

也许我可以通过手工修复声明行来解决这个问题？

编辑PS :为了完成我的解析python脚本：

import clang.cindex

index = clang.cindex.Index.create()
tu = index.parse(sys.argv[1], args=["-std=c++98"], options=clang.cindex.TranslationUnit.PARSE_SKIP_FUNCTION_BODIES)

def printall_visitor(node):
    print 'Found grammar element "%s" {%s} [line=%s, col=%s]' % (node.displayname, node.kind, node.location.line, node.location.column)

def visit(node, func):
    func(node)
    for c in node.get_children():
        visit(c, func)

visit(tu.cursor, printall_visitor)

clang

c++

python

parsing

回答 1

Stack Overflow用户

回答已采纳

发布于 2013-12-10 08:59:01

我将亲自回答这个问题，因为我想出的代码可能对未来的谷歌人有用。

最后，我对这两个方法进行了编码，这些方法可以在类声明行的继承列表中撤回基类列表。

一种是使用AST游标，另一种是完全手工操作，尽可能地处理C++的复杂性。

以下是整个结果：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
'''
Created on 2013/12/09

@author: voddou
'''

import sys
import re
import clang.cindex
import os
import string

class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    CYAN = '\033[96m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    MAGENTA = '\033[95m'
    GREY = '\033[90m'

    def disable(self):
        self.HEADER = ''
        self.OKBLUE = ''
        self.OKGREEN = ''
        self.WARNING = ''
        self.FAIL = ''
        self.ENDC = ''
        self.CYAN = ''
        self.MAGENTA = ''
        self.GREY = ''

from contextlib import contextmanager

@contextmanager
def scopedColorizer(color):
    sys.stdout.write(color)
    yield
    sys.stdout.write(bcolors.ENDC)

#clang.cindex.Config.set_library_file("C:/python27/DLLs/libclang.dll")

src_filepath = sys.argv[1]
src_basename = os.path.basename(src_filepath)

parseeLines = file(src_filepath).readlines()

def trim_all(astring):
    return "".join(astring.split()) 

def has_token(line, token):
    trimed = trim_all(line)
    pos = string.find(trimed, token)
    return pos != -1

def has_any_token(line, token_list):
    results = [has_token(line, t) for t in token_list]
    return any(results)

def is_any(astring, some_strings):
    return any([x == astring for x in some_strings])

def comment_out(line):
    return "//" + line

# alter the original file to remove #inlude directives and protective ifdef blocks
for i, l in enumerate(parseeLines):
    if has_token(l, "#include"):
        parseeLines[i] = comment_out(l)
    elif has_any_token(l, ["#ifdef", "#ifdefined", "#ifndef", "#if!defined", "#endif", "#elif", "#else"]):
        parseeLines[i] = comment_out(l)

index = clang.cindex.Index.create()
tu = index.parse(src_basename,
                 args=["-std=c++98"],
                 unsaved_files=[(src_basename, "".join(parseeLines))],
                 options=clang.cindex.TranslationUnit.PARSE_SKIP_FUNCTION_BODIES)

print 'Translation unit:', tu.spelling, "\n"

def gather_until(strlist, ifrom, endtokens):
    """make one string out of a list of strings, starting from a given index, until one token in endtokens is found.
    ex: gather_until(["foo", "toto", "bar", "kaz"], 1, ["r", "z"])
        will yield "totoba"
    """
    result = strlist[ifrom]
    nextline = ifrom + 1
    while not any([string.find(result, token) != -1 for token in endtokens]):
        result = result + strlist[nextline]
        nextline = nextline + 1
    nearest = result
    for t in endtokens:
        nearest = nearest.partition(t)[0]
    return nearest

def strip_templates_parameters(declline):
    """remove any content between < >
    """
    res = ""
    nested = 0
    for c in declline:
        if c == '>':
            nested = nested - 1
        if nested == 0:
            res = res + c
        if c == '<':
            nested = nested + 1
    return res

# thanks Markus Jarderot from Stackoverflow.com
def comment_remover(text):
    def replacer(match):
        s = match.group(0)
        if s.startswith('/'):
            return ""
        else:
            return s
    pattern = re.compile(
        r'//.*?$|/\*.*?\*/|\'(?:\\.|[^\\\'])*\'|"(?:\\.|[^\\"])*"',
        re.DOTALL | re.MULTILINE
    )
    return re.sub(pattern, replacer, text)

def replace_any_of(haystack, list_of_candidates, by_what):
    for cand in list_of_candidates:
        haystack = string.replace(haystack, cand, by_what)
    return haystack

cxx_keywords = ["class", "struct", "public", "private", "protected"]

def clean_name(displayname):
    """remove namespace and type tags
    """
    r = displayname.rpartition("::")[2]
    r = replace_any_of(r, cxx_keywords, "")
    return r

def find_parents_using_clang(node):
    l = []
    for c in node.get_children():
        if c.kind == clang.cindex.CursorKind.CXX_BASE_SPECIFIER:
            l.append(clean_name(c.displayname))
    return None if len(l) == 0 else l

# syntax based custom parsing
def find_parents_list(node):
    ideclline = node.location.line - 1
    declline = parseeLines[ideclline]
    with scopedColorizer(bcolors.WARNING):
        print "class decl line:", declline.strip()
    fulldecl = gather_until(parseeLines, ideclline, ["{", ";"])
    fulldecl = clean_name(fulldecl)
    fulldecl = trim_all(fulldecl)
    if string.find(fulldecl, ":") != -1:    # if inheritance exists on the declaration line
        baselist = fulldecl.partition(":")[2]
        res = strip_templates_parameters(baselist)  # because they are separated by commas, they would break the split(",")
        res = comment_remover(res)
        res = res.split(",")
        return res
    return None

# documentation generator
def make_htll_visitor(node):
    if (node.kind == clang.cindex.CursorKind.CLASS_DECL
        or node.kind == clang.cindex.CursorKind.STRUCT_DECL
        or node.kind == clang.cindex.CursorKind.CLASS_TEMPLATE):

        bases2 = find_parents_list(node)
        bases = find_parents_using_clang(node)
        if bases is not None:
            with scopedColorizer(bcolors.CYAN):
                print "class clang list of bases:", str(bases)

        if bases2 is not None:
            with scopedColorizer(bcolors.MAGENTA):
                print "class manual list of bases:", str(bases2)


def visit(node, func):
    func(node)
    for c in node.get_children():
        visit(c, func)

visit(tu.cursor, make_htll_visitor)

with scopedColorizer(bcolors.OKGREEN):
    print "all over"

这段代码允许我接受不完整的C++翻译单元，正确地解析如下声明：

struct ComplexBuffer
    : IAnimatable
    , Bugger,

        Mozafoka
{
};

此外，还应采取以下措施：

struct AnimHandler : NonCopyable, IHandlerPrivateGetter< AnimHandler, AafHandler > // CRTP
{
...
};

给我这个输出：

class manual list of bases: ['NonCopyable', 'IHandlerPrivateGetter<>']

这很好，clang函数版本没有返回基列表中的一个类。现在，可以避免使用set将这两个函数的结果合并到安全端，以防手动解析器遗漏了什么。但是，我认为这可能会导致微妙的重复，因为displayname和我自己的解析器之间的差异。

但是，谷歌人，这是一个不错的clang文档生成器模板，它不需要构建选项的完全正确性，而且非常快，因为它完全忽略了include的语句。

祝大家今天愉快。

票数 8

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/20485530

复制

相似问题

问使用libclang进行解析:当基类型未知时获得CXX_BASE_SPECIFIER游标
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用libclang进行解析:当基类型未知时获得CXX_BASE_SPECIFIER游标EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用libclang进行解析:当基类型未知时获得CXX_BASE_SPECIFIER游标
EN