首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python (newbie)从API调用解析XML

Python (newbie)从API调用解析XML
EN

Stack Overflow用户
提问于 2010-12-15 23:26:12
回答 3查看 15.4K关注 0票数 7

我已经寻找了一些关于堆栈/文档的教程/其他问题,但仍然无法解决。啊!

发出API请求和解析(想要赋值给变量,但这是这个问题的额外好处),这就是我正在尝试的。为什么我无法列出项目的标题和链接?

代码语言:javascript
复制
#!/usr/bin/python

# Screen Scraper for Subs
import urllib
from xml.etree import ElementTree as ET

show = 'heroes'
season = '4'
language = 'en'
limit = '1'

requestURL = 'http://api.allsubs.org/index.php?' \
           + 'search=' + show \
           + '+season+' + season \
           + '&language=' + language \
           + '&limit=' + limit

root = ET.parse(urllib.urlopen(requestURL)).getroot()
print root
print '\n'

items = root.findall('items')
for item in items: 
     item.find('title').text # should print: <![CDATA[Heroes Season 4 Subtitles]]>
     item.find('link').text # Should print: http://www.allsubs.org/subs-download/heroes+season+4/1223435/

XML响应

代码语言:javascript
复制
        <AllSubsAPI> 
        <title>AllSubs API: Subtitles Search</title> 
        <link>http://www.allsubs.org</link> 
        <description><![CDATA[Subtitles Search for Heroes Season 4]]></description> 
        <language>en-us</language> 
        <results>1</results> 
        <found_results>24</found_results> 
<items> 
    <item> 
            <title><![CDATA[Heroes Season 4 Subtitles]]></title> 
            <link>http://www.allsubs.org/subs-download/heroes+season+4/1223435/</link> 
            <filename>heroes-season-4-english-heroes-season-4-en.zip</filename> 
            <files_in_archive>Heroes - 4x01-02 - Orientation.HDTV.FQM.en.srt|Heroes - 4x17 - The Art of Deception.HDTV.2HD.en.srt|Heroes - 4x07 - Strange Attractors.HDTV.LOL.en.srt|Heroes - 4x08 - Once Upon a Time in Texas.HDTV.2HD.en.srt|Heroes - 4x07 - Strange Attractors.720p HDTV.DIMENSION.en.srt|Heroes - 4x05 - Hysterical Blindness.720p HDTV.X264.en.srt|Heroes - 4x09 - Shadowboxing.HDTV.LOL.en.srt|Heroes - 4x16 - Pass Fail.HDTV.LOL.en.srt|Heroes - 4x04 - Acceptance.HDTV.en.srt|Heroes - 4x01-02 - Orientation.720p HDTV.DIMENSION.en.srt|Heroes - 4x06 - Tabula Rasa.HDTV.NoTV.en.srt|Heroes - 4x10 - Brother's Keeper.HDTV.FQM.en.srt|Heroes - 4x04 - Acceptance.HDTV.FQM.en.srt|Heroes - 4x14 - Let It Bleed.720p HDTV.DIMENSION.en.srt|Heroes - 4x06 - Tabula Rasa.720p HDTV.SiTV.en.srt|Heroes - 4x08 - Once Upon a Time in Texas.HDTV.NoTV.en.srt|Heroes - 4x12 - The Fifth Stage.HDTV.LOL.en.srt|Heroes - 4x19 - Brave New World.HDTV.LOL.en.srt|Heroes - 4x15 - Close to You.720p HDTV.DIMENSION.en.srt|Heroes - 4x03 - Ink.720p HDTV.DIMENSION.en.srt|Heroes - 4x11 - Thanksgiving.720p HDTV.DIMENSION.en.srt|Heroes - 4x13 - Upon This Rock.720p HDTV.DIMENSION.en.srt|Heroes - 4x13 - Upon This Rock.HDTV.LOL.en.srt|Heroes - 4x14 - Let It Bleed.HDTV.LOL.en.srt|Heroes - 4x15 - Close to You.HDTV.LOL.en.srt|Heroes - 4x12 - The Fifth Stage.720p HDTV.DIMENSION.en.srt|Heroes - 4x18 - The Wall.HDTV.LOL.en.srt|Heroes - 4x08 - Once Upon a Time in Texas.720p HDTV.CTU.en.srt|Heroes - 4x17 - The Art of Deception.HDTV.CTU.en.srt|Heroes - 4x09 - Shadowboxing.720p HDTV.DIMENSION.en.srt|Heroes - 4x10 - Brother's Keeper.720p HDTV.DIMENSION.en.srt|Heroes - 4x04 - Acceptance.720p HDTV.CTU.en.srt|Heroes - 4x11 - Thanksgiving.HDTV.FQM.en.srt|Heroes - 4x03 - Ink.HDTV.FQM.en.srt|Heroes - 4x05 - Hysterical Blindness.HDTV.XII.en.srt|</files_in_archive> 
            <languages>en</languages> 
            <added_on>2010-02-16</added_on> 
    </item> 

</items> 
</AllSubsAPI>

更新:

这很有效,谢谢你的帮助并指出了我的拼写错误

代码语言:javascript
复制
items = root.findall('items/item')
for item in items: 
     print item.find('title').text
     print item.find('link').text
EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2010-12-15 23:28:34

代码语言:javascript
复制
items = root.findall('items')

应该是

代码语言:javascript
复制
items = root.findall('items/item')
票数 4
EN

Stack Overflow用户

发布于 2010-12-15 23:43:53

这对我很有效。请注意,我正在使用urllib2通过代理:

代码语言:javascript
复制
import urllib2
from xml.etree import ElementTree as ET

show = 'heroes'
season = '4'
language = 'en'
limit = '1'

requestURL = 'http://api.allsubs.org/index.php?' \
           + 'search=' + show \
           + '+season+' + season \
           + '&language=' + language \
           + '&limit=' + limit

root = ET.parse(urllib2.urlopen(requestURL)).getroot()
print root
print '\n'

items = root.findall('items')[0].findall('item')
for item in items: 
     print item.find('title').text # should print: <![CDATA[Heroes Season 4 Subtitles]]>
     print item.find('link').text # Should print: http://www.allsubs.org/subs-download/heroes+season+4/1223435/

请注意,findall(‘item’)查找"item“标签,您想要循环遍历的(我认为)是其中的”item“标签,所以我们找到了其中的are ()。此外,您还需要进行打印才能从python中获取任何内容。

而且,如果我用limit=2来做,我会得到一个:

代码语言:javascript
复制
Traceback (most recent call last):
  File "heros.py", line 18, in <module>
    root = ET.parse(urllib2.urlopen(requestURL)).getroot()
  File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 862, in parse
    tree.parse(source, parser)
  File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 586, in parse
    parser.feed(data)
  File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 1245, in feed
    self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 24, column 95

我不确定从这个API返回的XML是不是格式良好的--对于初学者来说,一开始没有"xml“元素。我不会相信它的。

票数 3
EN

Stack Overflow用户

发布于 2010-12-15 23:34:40

你不是在迭代“item”元素,实际上是在迭代“item”元素。

我认为应该是:

代码语言:javascript
复制
items = root.findall('items')
childItems = items.findall('item')
for childItem in childItems: 
    childItem.find('title').text # should print: <![CDATA[Heroes Season 4 Subtitles]]>
    childItem.find('link').text # Should print: http://www.allsubs.org/subs-download/heroes+season+4/1223435
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/4451600

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档