我已经寻找了一些关于堆栈/文档的教程/其他问题,但仍然无法解决。啊!
发出API请求和解析(想要赋值给变量,但这是这个问题的额外好处),这就是我正在尝试的。为什么我无法列出项目的标题和链接?
#!/usr/bin/python
# Screen Scraper for Subs
import urllib
from xml.etree import ElementTree as ET
show = 'heroes'
season = '4'
language = 'en'
limit = '1'
requestURL = 'http://api.allsubs.org/index.php?' \
+ 'search=' + show \
+ '+season+' + season \
+ '&language=' + language \
+ '&limit=' + limit
root = ET.parse(urllib.urlopen(requestURL)).getroot()
print root
print '\n'
items = root.findall('items')
for item in items:
item.find('title').text # should print: <![CDATA[Heroes Season 4 Subtitles]]>
item.find('link').text # Should print: http://www.allsubs.org/subs-download/heroes+season+4/1223435/XML响应
<AllSubsAPI>
<title>AllSubs API: Subtitles Search</title>
<link>http://www.allsubs.org</link>
<description><![CDATA[Subtitles Search for Heroes Season 4]]></description>
<language>en-us</language>
<results>1</results>
<found_results>24</found_results>
<items>
<item>
<title><![CDATA[Heroes Season 4 Subtitles]]></title>
<link>http://www.allsubs.org/subs-download/heroes+season+4/1223435/</link>
<filename>heroes-season-4-english-heroes-season-4-en.zip</filename>
<files_in_archive>Heroes - 4x01-02 - Orientation.HDTV.FQM.en.srt|Heroes - 4x17 - The Art of Deception.HDTV.2HD.en.srt|Heroes - 4x07 - Strange Attractors.HDTV.LOL.en.srt|Heroes - 4x08 - Once Upon a Time in Texas.HDTV.2HD.en.srt|Heroes - 4x07 - Strange Attractors.720p HDTV.DIMENSION.en.srt|Heroes - 4x05 - Hysterical Blindness.720p HDTV.X264.en.srt|Heroes - 4x09 - Shadowboxing.HDTV.LOL.en.srt|Heroes - 4x16 - Pass Fail.HDTV.LOL.en.srt|Heroes - 4x04 - Acceptance.HDTV.en.srt|Heroes - 4x01-02 - Orientation.720p HDTV.DIMENSION.en.srt|Heroes - 4x06 - Tabula Rasa.HDTV.NoTV.en.srt|Heroes - 4x10 - Brother's Keeper.HDTV.FQM.en.srt|Heroes - 4x04 - Acceptance.HDTV.FQM.en.srt|Heroes - 4x14 - Let It Bleed.720p HDTV.DIMENSION.en.srt|Heroes - 4x06 - Tabula Rasa.720p HDTV.SiTV.en.srt|Heroes - 4x08 - Once Upon a Time in Texas.HDTV.NoTV.en.srt|Heroes - 4x12 - The Fifth Stage.HDTV.LOL.en.srt|Heroes - 4x19 - Brave New World.HDTV.LOL.en.srt|Heroes - 4x15 - Close to You.720p HDTV.DIMENSION.en.srt|Heroes - 4x03 - Ink.720p HDTV.DIMENSION.en.srt|Heroes - 4x11 - Thanksgiving.720p HDTV.DIMENSION.en.srt|Heroes - 4x13 - Upon This Rock.720p HDTV.DIMENSION.en.srt|Heroes - 4x13 - Upon This Rock.HDTV.LOL.en.srt|Heroes - 4x14 - Let It Bleed.HDTV.LOL.en.srt|Heroes - 4x15 - Close to You.HDTV.LOL.en.srt|Heroes - 4x12 - The Fifth Stage.720p HDTV.DIMENSION.en.srt|Heroes - 4x18 - The Wall.HDTV.LOL.en.srt|Heroes - 4x08 - Once Upon a Time in Texas.720p HDTV.CTU.en.srt|Heroes - 4x17 - The Art of Deception.HDTV.CTU.en.srt|Heroes - 4x09 - Shadowboxing.720p HDTV.DIMENSION.en.srt|Heroes - 4x10 - Brother's Keeper.720p HDTV.DIMENSION.en.srt|Heroes - 4x04 - Acceptance.720p HDTV.CTU.en.srt|Heroes - 4x11 - Thanksgiving.HDTV.FQM.en.srt|Heroes - 4x03 - Ink.HDTV.FQM.en.srt|Heroes - 4x05 - Hysterical Blindness.HDTV.XII.en.srt|</files_in_archive>
<languages>en</languages>
<added_on>2010-02-16</added_on>
</item>
</items>
</AllSubsAPI>更新:
这很有效,谢谢你的帮助并指出了我的拼写错误
items = root.findall('items/item')
for item in items:
print item.find('title').text
print item.find('link').text发布于 2010-12-15 23:28:34
items = root.findall('items')应该是
items = root.findall('items/item')发布于 2010-12-15 23:43:53
这对我很有效。请注意,我正在使用urllib2通过代理:
import urllib2
from xml.etree import ElementTree as ET
show = 'heroes'
season = '4'
language = 'en'
limit = '1'
requestURL = 'http://api.allsubs.org/index.php?' \
+ 'search=' + show \
+ '+season+' + season \
+ '&language=' + language \
+ '&limit=' + limit
root = ET.parse(urllib2.urlopen(requestURL)).getroot()
print root
print '\n'
items = root.findall('items')[0].findall('item')
for item in items:
print item.find('title').text # should print: <![CDATA[Heroes Season 4 Subtitles]]>
print item.find('link').text # Should print: http://www.allsubs.org/subs-download/heroes+season+4/1223435/请注意,findall(‘item’)查找"item“标签,您想要循环遍历的(我认为)是其中的”item“标签,所以我们找到了其中的are ()。此外,您还需要进行打印才能从python中获取任何内容。
而且,如果我用limit=2来做,我会得到一个:
Traceback (most recent call last):
File "heros.py", line 18, in <module>
root = ET.parse(urllib2.urlopen(requestURL)).getroot()
File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 862, in parse
tree.parse(source, parser)
File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 586, in parse
parser.feed(data)
File "/usr/lib/python2.6/xml/etree/ElementTree.py", line 1245, in feed
self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 24, column 95我不确定从这个API返回的XML是不是格式良好的--对于初学者来说,一开始没有"xml“元素。我不会相信它的。
发布于 2010-12-15 23:34:40
你不是在迭代“item”元素,实际上是在迭代“item”元素。
我认为应该是:
items = root.findall('items')
childItems = items.findall('item')
for childItem in childItems:
childItem.find('title').text # should print: <![CDATA[Heroes Season 4 Subtitles]]>
childItem.find('link').text # Should print: http://www.allsubs.org/subs-download/heroes+season+4/1223435https://stackoverflow.com/questions/4451600
复制相似问题