文章/答案/技术大牛

发布

社区首页 >问答首页 >如何从fb2书中提取目录？

问如何从fb2书中提取目录？
EN

Unix & Linux用户

提问于 2023-05-04 23:43:05

回答 3查看 52关注 0票数 0

我有一本fb2格式的书。我想打印目录，包括名称和编号的“部分”，“章节”，“插曲”等。

有什么办法能让我在终点站完成这个任务吗？有一个类似的问题，但对于电子酒吧格式。

我知道fb2是一种xml格式。但是，是否有工具只能提取TOC呢？它们在标签<section>、<title>和<subtitle>中。

如果没有，我想可以基于正式的FB2_至_txt.xsl文件来生成xsl文件。也许电子书转换也能做到这一点？

我正在研究的这本书的结构如下：

<?xml version="1.0" encoding="utf8"?>
<FictionBook xmlns:l="http://www.w3.org/1999/xlink" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.gribuser.ru/xml/fictionbook/2.0">
  <description>
    <title-info>
      <genre>fiction</genre>
      <author>
        <first-name>John</first-name>
        <last-name>Doe</last-name>
      </author>
      <book-title>Fiction Book</book-title>
      <annotation>
        <p>Hello</p>
      </annotation>
      <keywords>john, doe, fiction</keywords>
      <date value="2011-07-18">18.07.2011</date>
      <coverpage></coverpage>
      <lang>en</lang>
    </title-info>
    <document-info>
      <author>
        <first-name></first-name>
        <last-name></last-name>
        <nickname></nickname>
      </author>
      <program-used>Fb2 Gem</program-used>
      <date value="2011-07-18">18.07.2011</date>
      <src-url></src-url>
      <src-ocr></src-ocr>
      <id></id>
      <version>1.0</version>
    </document-info>
    <publish-info>
    </publish-info>
  </description>
  <body>
    <title>
      <p>John Doe</p>
      <empty-line/>
      <p>Fiction Book</p>
    </title>
    <section>
      <title>
        <p>Part 1</p>
        <p>Some name of Part 1</p>
      </title>
      <section>
        <title>
          <p>Chapter 1</p>
          <p>Some name of Chapter 1</p>
        </title>
        <subtitle>Episode 1</subtitle>
        <p>Line one of the first episode</p>
        <p>Line two of the first episode</p>
        <p>Line three of the first episode</p>
        <subtitle>Episode 2</subtitle>
        <p>Line one of the second episode</p>
        <p>Line two of the second episode</p>
        <p>Line three of the second episode</p>
      </section>
    </section>
    <section>
      <title>
        <p>Part 2</p>
        <p>Some name of Part 2</p>
      </title>
      <section>
        <title>
          <p>Chapter 3</p>
          <p>Some name of Chapter 3</p>
        </title>
        <subtitle>Episode 3</subtitle>
        <p>Line one of the third episode</p>
        <p>Line two of the third episode</p>
        <p>Line three of the third episode</p>
        <subtitle>Episode 4</subtitle>
        <p>Line one of the fourth episode</p>
        <p>Line two of the fourth episode</p>
        <p>Line three of the fourth episode</p>
      </section>
    </section>
  </body>
</FictionBook>

我想从输出中得到以下信息：

Part 1
Some name of Part 1
Chapter 1
Some name of Chapter 1
Episode 1
Episode 2
Part 2
Some name of Part 2
Chapter 3
Some name of Chapter 3
Episode 3
Episode 4

xml

books

ebooks

回答 3

Unix & Linux用户

回答已采纳

发布于 2023-05-05 07:57:28

使用xmlstarlet：

xmlstarlet select --template \
    --value-of '//_:section/_:title/_:p | //_:subtitle' \
    -nl file.xml

或者，使用短期期权，

xmlstarlet sel -t \
    -v '//_:section/_:title/_:p | //_:subtitle' \
    -n file.xml

这里使用的XPath查询将提取每个section下title节点的p节点的值，以及所有subtitle节点的值。

表达式中每个节点名称之前的前缀_:是文档正在使用的命名空间标识符的匿名占位符。

以上两个命令的输出(给定示例文档)将是

Part 1
Some name of Part 1
Chapter 1
Some name of Chapter 1
Episode 1
Episode 2
Part 2
Some name of Part 2
Chapter 3
Some name of Chapter 3
Episode 3
Episode 4

您是否也想要这本书的标题，然后删除表达式中的_:section限制(这将使书名的p节点也匹配)。

另一种获取每个部分的标题和副标题的方法(避免书的标题)看起来可能更干净一些(因为它表明字幕是从章节中获取的，而不仅仅是从任何地方获取的)，首先将匹配限制在节中，然后从这些部分获取数据：

xmlstarlet select --template \
    --match '//_:section' \
    --value-of '_:title/_:p | _:subtitle' \
    -nl file.xml

票数 2

Unix & Linux用户

发布于 2023-05-05 14:10:58

使用XPath3 FOSS (GPLv3)命令行工具，xidel：

XPath2 构造序列：

xidel -e '(//section/title/p, //subtitle)'  file.xml

XPath1：

xidel -e '//section/title/p | //subtitle'  file.xml

Part 1
Some name of Part 1
Chapter 1
Some name of Chapter 1
Episode 1
Episode 2
Part 2
Some name of Part 2
Chapter 3
Some name of Chapter 3
Episode 3
Episode 4

xidel是用于查询XML/HTML/JSON的瑞士军刀。它足够聪明，可以自己管理默认的namespace。

票数 2

Unix & Linux用户

发布于 2023-05-05 07:37:17

在我看来，输出包含XPath表达式(//title/p | //subtitle)的结果。因此，您只需要找到一个适合您的环境的工具，它可以执行XPath表达式并显示结果。

有关一些建议的命令行工具，请参见https://www.baeldung.com/linux/evaluate-xpath。还有Saxon的Gizmo工具(我公司的产品)。

票数 -1

页面原文内容由Unix & Linux提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://unix.stackexchange.com/questions/744981

复制

相似问题

问如何从fb2书中提取目录？
EN

回答 3

Unix & Linux用户

Unix & Linux用户

Unix & Linux用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何从fb2书中提取目录？EN

回答 3

Unix & Linux用户

Unix & Linux用户

Unix & Linux用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何从fb2书中提取目录？
EN