我有一个很大的xml文件"abcd.xml“,大约800MB。如果用户输入的内容与作者或标题匹配,我想获取图书列表的信息。
我已经用一个小文件做到了这一点,我如何使用iterparse()来处理一个大文件呢?
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE dblp SYSTEM "dblp.dtd">
<dblp>
<article mdate="2011-01-11" key="journals/acta/Saxena96">
<author>Sanjeev Saxena</author>
<title>Parallel Integer Sorting and Simulation Amongst CRCW Models.</title>
<pages>607-619</pages>
<year>1996</year>
<volume>33</volume>
<journal>Acta Inf.</journal>
<number>7</number>
<url>db/journals/acta/acta33.html#Saxena96</url>
<ee>http://dx.doi.org/10.1007/BF03036466</ee>
</article>编码:
import lxml.etree as ET
data = ET.parse('abcd.xml')
root = ET.fromstring(data)
title = raw_input('enter the name: ')
article = root.xpath('.//article[starts-with(title, "%s")]' % title)[0]
for prop in ['author', 'pages', 'year', 'volume', 'journal']:
print article.findtext(prop)输出结构:-
Sanjeev Saxena
Parallel Integer Sorting and Simulation Amongst CRCW Models.
607-619
1996
33
Acta Inf.
........
........
........发布于 2015-03-17 02:50:42
raw_input()
lxml模块解析输入文件
lxml标记从用户获取标题名称,该标记从步骤3中的每个项目标记中的用户输入开始<>H114>创建列表元组列表,保存result.标记及其文本信息。
代码:
import lxml.etree as ET
root = ET.parse('input.xml')
title = raw_input('enter the name: ')
articles = root.xpath('.//article[starts-with(title, "%s")]' % title)
result = []
for article in articles:
tmp = []
for i in article.getchildren():
tmp.append((i.tag, i.text))
result.append(tmp)
#- Print result:
for i in result:
print "\n"
for j in i:
print "%s:%s"%(j[0], j[1])输出:
vivek@vivek:~/Desktop/stackoverflow/anna$ python 3.py
enter the name: Parallel Integer Sorting and Simulation
author:Sanjeev Saxena
title:Parallel Integer Sorting and Simulation Amongst CRCW Models.
pages:607-619
year:1996
volume:33
journal:Acta Inf.
number:7
url:db/journals/acta/acta33.html#Saxena96
ee:http://dx.doi.org/10.1007/BF03036466
author:Sanjeev Saxena
title:Parallel Integer Sorting and Simulation Amongst CRCW Models.11
pages:607-619
year:1996
volume:33
journal:Acta Inf.
number:7
url:db/journals/acta/acta33.html#Saxena96
ee:http://dx.doi.org/10.1007/BF03036466
vivek@vivek:~/Desktop/stackoverflow/anna$ https://stackoverflow.com/questions/29079939
复制相似问题