首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在pdfs中使用poppler访问主题名称?

如何在pdfs中使用poppler访问主题名称?
EN

Stack Overflow用户
提问于 2012-12-05 13:51:25
回答 1查看 271关注 0票数 1

我正在使用poppler,我想使用poppler访问特定页码的主题或标题,所以请告诉我如何使用poppler来实现这一点。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2012-12-05 22:19:42

使用glib API。不知道您想要哪个API。

我非常确定没有主题/标题存储在特定的页面中。如果有索引,则必须遍历索引。

具有回溯功能的Walk the index。如果幸运的话,每个索引节点都包含一个PopplerActionGotoDest (检查类型!)。您可以从PopplerAction对象(gchar * title )中获取标题,并从包含的PopplerDest (int page_num)中获取页码。page_num应该是该部分的第一页。

假设你的PDF有一个包含PopplerActionGotoDest对象的索引。然后,您只需遍历它,检查page_num。如果page_num > searched_num,则返回一步。当你在正确的父母身边时,带着孩子散步。这将为您提供最佳匹配。我只是为它做了一些代码:

代码语言:javascript
复制
gchar* getTitle(PopplerIndexIter *iter, int num, PopplerIndexIter *last,PopplerDocument *doc)
{
    int cur_num = 0;
    int next;
    PopplerAction * action;
    PopplerDest * dest;
    gchar * title = NULL;
    PopplerIndexIter  * last_tmp;

    do
    {
            action = poppler_index_iter_get_action(iter);
            if (action->type != POPPLER_ACTION_GOTO_DEST) {
                printf("No GOTO_DEST!\n");
                return NULL;
            }

            //get page number of current node
            if (action->goto_dest.dest->type == POPPLER_DEST_NAMED) {
                dest = poppler_document_find_dest (doc, action->goto_dest.dest->named_dest);
                cur_num = dest->page_num;
                poppler_dest_free(dest);
            } else {
                cur_num = action->goto_dest.dest->page_num;
            }
            //printf("cur_num: %d, %d\n",cur_num,num);

            //free action, as we don't need it anymore
            poppler_action_free(action);

            //are there nodes following this one?
            last_tmp = poppler_index_iter_copy(iter);
            next = poppler_index_iter_next (iter);

            //descend
            if (!next || cur_num > num) {
                if ((!next && cur_num < num) || cur_num == num) {
                    //descend current node
                    if (last) {
                        poppler_index_iter_free(last);
                    }
                    last = last_tmp;
                }
                //descend last node (backtracking)
                if (last) {
                    /* Get the the action and do something with it */
                    PopplerIndexIter *child = poppler_index_iter_get_child (last);
                    gchar * tmp = NULL;
                    if (child) {
                        tmp = getTitle(child,num,last,doc);
                        poppler_index_iter_free (child);
                    } else {
                        action = poppler_index_iter_get_action(last);
                        if (action->type != POPPLER_ACTION_GOTO_DEST) {
                            tmp = NULL;
                        } else {
                            tmp = g_strdup (action->any.title);
                        }
                        poppler_action_free(action);
                        poppler_index_iter_free (last);
                    }

                    return tmp;
                } else {
                    return NULL;
                }
            }

            if (cur_num > num || (next && cur_num != 0)) {
                // free last index_iter
                if (last) {
                    poppler_index_iter_free(last);
                }
                last = last_tmp;
            }
    }
  while (next);

    return NULL;
}

通过以下方式调用getTitle

代码语言:javascript
复制
    for (i = 0; i < num_pages; i++) {
            iter = poppler_index_iter_new (document);
            title = getTitle(iter,i,NULL,document);
            poppler_index_iter_free (iter);

            if (title) {
                printf("title of %d: %s\n",i, title);
                g_free(title);
            } else {
                printf("%d: no title\n",i);
            }
    }
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/13717131

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档