我刚刚开始学习如何使用minidom解析xml。我尝试使用以下代码获取作者的姓名(下面是xml数据):
from xml.dom import minidom
xmldoc = minidom.parse("cora.xml")
author = xmldoc.getElementsByTagName ('author')
for author in author:
authorID=author.getElementsByTagName('author id')
print authorID我一直得到空方括号([])。有谁能帮帮我吗?我还需要标题和地点。提前谢谢。请参阅下面的xml数据:
<?xml version="1.0" encoding="UTF-8"?>
<coraRADD>
<publication id="ahlskog1994a">
<author id="199">M. Ahlskog</author>
<author id="74"> J. Paloheimo</author>
<author id="64"> H. Stubb</author>
<author id="103"> P. Dyreklev</author>
<author id="54"> M. Fahlman</author>
<title>Inganas</title>
<title>and</title>
<title>M.R.</title>
<venue>
<venue pubid="ahlskog1994a" id="1">
<name>Andersson</name>
<name> J Appl. Phys.</name>
<vol>76</vol>
<date> (1994). </date>
</venue>发布于 2013-05-16 21:34:33
您只能找到带有getElementsByTagName()的标签,而不能找到属性。你需要通过Element.getAttribute() method来访问它们:
for author in author:
authorID = author.getAttribute('id')
print authorID如果您还在学习如何解析XML,那么您真的希望远离DOM。DOM过于冗长,无法适应许多不同的编程语言。
ElementTree API将更易于使用:
import xml.etree.ElementTree as ET
tree = ET.parse('cora.xml')
root = tree.getroot()
# loop over all publications
for pub in root.findall('publication'):
print ' '.join([t.text for t in pub.findall('title')])
for author in pub.findall('author'):
print 'Author id: {}'.format(author.attrib['id'])
print 'Author name: {}'.format(author.text)
for venue in pub.findall('.//venue[@id]'): # all venue tags with id attribute
print ', '.join([name.text for name in venue.findall('name')])https://stackoverflow.com/questions/16588597
复制相似问题