我有一个嵌套的<iframes>列表
iframes = [
[<iframe data-lazy-src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/309819830&color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false" frameborder="no" height="166" scrolling="no" src="data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=" width="100%"></iframe>, <iframe allowtransparency="true" data-lazy-src="//www.facebook.com/plugins/likebox.php?href=https%3A%2F%2Fwww.facebook.com%2FPauseMusicale&width=300&height=62&show_faces=false&colorscheme=light&stream=false&show_border=false&header=false" frameborder="0" scrolling="no" src="data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=" style="border:none; overflow:hidden; width:300px; height:62px;"></iframe>, <iframe allowfullscreen="" data-lazy-src="//www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1" frameborder="0" height="169" src="data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=" width="100%"></iframe>], [<iframe data-lazy-src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/310079005&color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false" frameborder="no" height="166" scrolling="no" src="data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=" width="100%"></iframe>, <iframe allowtransparency="true" data-lazy-src="//www.facebook.com/plugins/likebox.php?href=https%3A%2F%2Fwww.facebook.com%2FPauseMusicale&width=300&height=62&show_faces=false&colorscheme=light&stream=false&show_border=false&header=false" frameborder="0" scrolling="no" src="data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=" style="border:none; overflow:hidden; width:300px; height:62px;"></iframe>, <iframe allowfullscreen="" data-lazy-src="//www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1" frameborder="0" height="169" src="data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=" width="100%"></iframe>],
[<iframe etc],
[<iframe etc]]我想要从中获取所有的['data-lazy-src']。
我使用下面的代码来实现这个目的:
for iframe in iframes:
for i in iframe:
scheme, netloc, path, params, query, fragment = urlparse(i.attrs['data-lazy-src'])
if not scheme:
scheme = 'http'
url = urlunparse((scheme, netloc, path, params, query, fragment))
print('Fetching {}'.format(url))
f = urllib2.urlopen(url)但是我得到了:
Fetching http://www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1
Fetching http://www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1
Fetching http://www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1我知道我遗漏了一些非常明显的东西,但我就是看不出来。
发布于 2017-03-09 10:42:59
您可以从iframes获取html字符串,然后将其传递到BeautifulSoup中,以便轻松解析。试试这样的东西。
from bs4 import BeautifulSoup
iframe = '<iframe data-lazy-src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/309819830..." frameborder="no"></iframe>'
soup = BeautifulSoup(iframe, 'html.parser')
tag = soup.find_all('iframe')[0]
print(tag['data-lazy-src'])发布于 2017-03-09 11:09:41
问题出在生成嵌套列表的方式上,即将soup.find_all('iframe')追加到iframes = []。
删除附加零件后,它的工作方式如下:
(...)
iframes = soup.find_all('iframe')
for iframe in iframes:
scheme, netloc, path, params, query, fragment = urlparse(iframe.attrs['data-lazy-src'])
if not scheme:
scheme = 'http' # default scheme you used when getting the current page
url = urlunparse((scheme, netloc, path, params, query, fragment))
print('Fetching {}'.format(url))
f = urllib2.urlopen(url)结果:
Fetching https://www.youtube.com/embed/OWr5FawT2Ks?rel=0
Fetching https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/308112514&color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false
Fetching http://www.facebook.com/plugins/likebox.php?href=https%3A%2F%2Fwww.facebook.com%2FPauseMusicale&width=300&height=62&show_faces=false&colorscheme=light&stream=false&show_border=false&header=false
Fetching http://www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1https://stackoverflow.com/questions/42662948
复制相似问题