我在Python3中使用lxml.objectify解析一个XML文件:
<root>
<object_header></object_header>
<object_details></object_details>
<object_details></object_details>
<object_header></object_header>
<object_details></object_details>
<object_header></object_header>
</root>注意,有时对象没有属性。
我目前解析这个(这是工作的,但不优雅)的方式是通过以下几点:
from lxml import objectify, etree
root = objectify.parse(xmlFile).getroot()
elems = [el for el in root.iterchildren()]
# data is list of objects
data = []
# Have to instantiate outside of for loop in case last object has not details.
objectDetails = ''
# Don't store first object right away.
firstObject = True
# Iterate through each XML element.
for elem in elems:
if elem.tag == 'object_header':
# Remember object header info.
object = storeHeaderInfo(objectDetails)
# Skip saving if first object, need to grab object details.
if firstObject == True:
# Don't skip again, in case object has no details.
firstObject = False
continue
# Save object, already grabbed object details.
data.append(object)
else:
# Process object details in <object_details> tag.
objectDetails += etree.tostring(elem)
# Save last object.
object = storeHeaderInfo(objectDetails)
data.append(object)我不喜欢的是,我不得不两次编写存储对象的代码。对for循环中的每个对象执行一次,对最后一个对象再执行一次。
有没有更多的节奏曲或优雅的方式来做到这一点?
发布于 2016-05-23 15:03:15
如果使用following-sibling::*表达式,可以使事情变得更简单:
from lxml import objectify, etree
root = objectify.parse("input.xml").getroot()
elems = root.xpath("//object_header")
for elem in elems:
header = elem.text
objectDetails = ''
for sibling in elem.xpath("following-sibling::*"):
if sibling.tag == 'object_header':
break
objectDetails += str(etree.tostring(sibling))
print(header, objectDetails)鉴于以下投入:
<root>
<object_header>object1</object_header>
<object_details>detail1</object_details>
<object_details>detail2</object_details>
<object_header>object2</object_header>
<object_details>detail1</object_details>
<object_header>object3</object_header>
</root>代码将打印:
object1 b'<object_details>detail1</object_details>'b'<object_details>detail2</object_details>'
object2 b'<object_details>detail1</object_details>'
object3 https://stackoverflow.com/questions/37394235
复制相似问题