我想使用xml.etree.ElementTree.iterparse()来抓取XML文件的某些部分。该文件有60 1B和1B行,因此我不想将其全部加载到内存中。我没有看到一种在xml库中输出整个XML子部分的方法。我认识到iterparse是迭代的,可能到目前为止还只是向前看。我该怎么做呢?
from xml.etree.ElementTree import iterparse
context = iterparse("file.xml", events=("start", "end"))
for event, elem in context:
if event == 'start':
if elem.tag == 'page':
# Splice out this subset of the XML, including tags
# Or, better, splice it if `<title>` includes "Foo".
else:
elem.clear()XML大致如下所示:
<siteinfo>
<page>
<title>Foo</title>
<text>Bar</text>
</page>
<page>
<title>NotFoo</title>
<text>NotBar</text>
</page>
</siteinfo>发布于 2017-11-16 17:42:15
我尝试了一些东西,它并不是你所期望的输出,只是分享它是否对你有用
path='D:\data.xml'
from xml.etree import ElementTree as Et
context = Et.iterparse(path,events=("start", "end"))
root = None
for event, elem in context:
if event=='end' or event=='start':
if elem.text=='Foo':
elem.clear()
root=elem
with open('d:\output.xml', 'wb') as file:
Et.ElementTree(root).write(file, encoding='utf-8', xml_declaration=True)输出文件:
<?xml version='1.0' encoding='utf-8'?>
<siteinfo>
<page>
**#some how this <title /> remains**
***<title />***
<text>Bar</text>
</page>
<page>
<title>NotFoo</title>
<text>NotBar</text>
</page>
</siteinfo>https://stackoverflow.com/questions/47305272
复制相似问题