首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >解析xml时保留实体引用

解析xml时保留实体引用
EN

Stack Overflow用户
提问于 2013-07-26 21:00:08
回答 1查看 225关注 0票数 1

运行以下简单脚本:

代码语言:javascript
复制
from lxml import etree

tree = etree.parse('VimKeys.xml')
root = tree.getroot()

for child in root:
  print ("<table>")
  print ("<caption>" + child.attrib['title'] + "</caption>")
  for child in child:
    print ("<tr>")
    print ("<th>" + child.text + "</th>")
    print ("<td>" + child.attrib['description'] + "</td>")
    print ("</tr>")
  print ("</table>")

针对以下xml:

代码语言:javascript
复制
<keycommands>
  <category title="Editing">
    <key description="replace">r</key>
    <key description="change, line">c,cc</key>
    <key description="join line with the following">J</key>
    <key description="delete &amp; insert">s</key>
    <key description="change case">~</key>
    <key description="apply last edit">.</key>
    <key description="undo, redo">u,⌃+r</key>
    <key description="indent line right, left">&gt;&gt;,&lt;&lt;</key>
    <key description="auto-indent line">==</key>
  </category>
</keycommands>

结果如下:

代码语言:javascript
复制
<caption>Editing</caption>
  <tr>
    <th>r</th>
    <td>replace</td>
  </tr>
  <tr>
    <th>c,cc</th>
    <td>change, line</td>
  </tr>
  <tr>
    <th>J</th>
    <td>join line with the following</td>
  </tr>
  <tr>
    <th>s</th>
    <td>delete & insert</td>
  </tr>
  <tr>
    <th>~</th>
    <td>change case</td>
  </tr>
  <tr>
    <th>.</th>
    <td>apply last edit</td>
  </tr>
  <tr>
    <th>u,⌃+r</th>
    <td>undo, redo</td>
  </tr>
  <tr>
    <th>>>,<<</th>
    <td>indent line right, left</td>
  </tr>
  <tr>
    <th>==</th>
    <td>auto-indent line</td>
  </tr>
</table>

,这是无效的HTML,这是由于小于和大于符号被引用为

代码语言:javascript
复制
&lt; and &gt;

在源文档中。

如何在最终产品中保存这些?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2013-07-26 21:17:30

使用元素类构建一个新的XML树,而不是使用print“手动”格式化它:

代码语言:javascript
复制
import lxml.etree as ET

tree = ET.parse('VimKeys.xml')
root = tree.getroot()

newroot = ET.Element('root')
for i, child in enumerate(root):
    table = ET.Element('table')
    newroot.insert(i, table)
    caption = ET.Element('caption')
    caption.text = child.attrib['title']
    table.insert(0, caption)
    for j, c in enumerate(child, 1):
        tr = ET.Element('tr')
        table.insert(j, tr)
        th = ET.Element('th')
        th.text = c.text
        tr.insert(0, th)

        td = ET.Element('td')
        td.text = c.attrib['description']
        tr.insert(1, td)

print(ET.tostring(newroot, pretty_print=True))

或者,使用电子工厂。这样做可以使预期的结构更容易阅读(并修改):

代码语言:javascript
复制
import lxml.etree as ET
import lxml.builder as builder

tree = ET.parse('VimKeys.xml')
root = tree.getroot()

E = builder.E
tables = []
for child in root:
    trs = []
    for c in child:
        trs.append(E('tr',
                     E('th', c.text),
                     E('td', c.attrib['description'])))
    tables.append(E('table',
                    E('caption', child.attrib['title']),
                    *trs))

newroot = E('root', *tables)
print(ET.tostring(newroot, pretty_print=True))
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/17890957

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档