首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python -提取<iframe>源代码

Python -提取<iframe>源代码
EN

Stack Overflow用户
提问于 2017-03-08 12:18:48
回答 2查看 2.9K关注 0票数 0

我有一个嵌套的<iframes>列表

代码语言:javascript
复制
iframes = [
[<iframe data-lazy-src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/309819830&amp;color=ff5500&amp;auto_play=false&amp;hide_related=false&amp;show_comments=true&amp;show_user=true&amp;show_reposts=false" frameborder="no" height="166" scrolling="no" src="data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=" width="100%"></iframe>, <iframe allowtransparency="true" data-lazy-src="//www.facebook.com/plugins/likebox.php?href=https%3A%2F%2Fwww.facebook.com%2FPauseMusicale&amp;width=300&amp;height=62&amp;show_faces=false&amp;colorscheme=light&amp;stream=false&amp;show_border=false&amp;header=false" frameborder="0" scrolling="no" src="data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=" style="border:none; overflow:hidden; width:300px; height:62px;"></iframe>, <iframe allowfullscreen="" data-lazy-src="//www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1" frameborder="0" height="169" src="data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=" width="100%"></iframe>], [<iframe data-lazy-src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/310079005&amp;color=ff5500&amp;auto_play=false&amp;hide_related=false&amp;show_comments=true&amp;show_user=true&amp;show_reposts=false" frameborder="no" height="166" scrolling="no" src="data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=" width="100%"></iframe>, <iframe allowtransparency="true" data-lazy-src="//www.facebook.com/plugins/likebox.php?href=https%3A%2F%2Fwww.facebook.com%2FPauseMusicale&amp;width=300&amp;height=62&amp;show_faces=false&amp;colorscheme=light&amp;stream=false&amp;show_border=false&amp;header=false" frameborder="0" scrolling="no" src="data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=" style="border:none; overflow:hidden; width:300px; height:62px;"></iframe>, <iframe allowfullscreen="" data-lazy-src="//www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1" frameborder="0" height="169" src="data:image/gif;base64,R0lGODdhAQABAPAAAP///wAAACwAAAAAAQABAEACAkQBADs=" width="100%"></iframe>],
[<iframe etc], 
[<iframe etc]]

我想要从中获取所有的['data-lazy-src']

我使用下面的代码来实现这个目的:

代码语言:javascript
复制
for iframe in iframes:
    for i in iframe:        
        scheme, netloc, path, params, query, fragment = urlparse(i.attrs['data-lazy-src'])
        if not scheme:
            scheme = 'http'   
        url = urlunparse((scheme, netloc, path, params, query, fragment))
        print('Fetching {}'.format(url))
        f = urllib2.urlopen(url)

但是我得到了:

代码语言:javascript
复制
Fetching http://www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1
Fetching http://www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1
Fetching http://www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1

我知道我遗漏了一些非常明显的东西,但我就是看不出来。

EN

回答 2

Stack Overflow用户

发布于 2017-03-09 10:42:59

您可以从iframes获取html字符串,然后将其传递到BeautifulSoup中,以便轻松解析。试试这样的东西。

代码语言:javascript
复制
from bs4 import BeautifulSoup

iframe = '<iframe data-lazy-src="https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/309819830..." frameborder="no"></iframe>'

soup = BeautifulSoup(iframe, 'html.parser')
tag = soup.find_all('iframe')[0]
print(tag['data-lazy-src'])
票数 1
EN

Stack Overflow用户

发布于 2017-03-09 11:09:41

问题出在生成嵌套列表的方式上,即将soup.find_all('iframe')追加到iframes = []

删除附加零件后,它的工作方式如下:

代码语言:javascript
复制
   (...)

    iframes = soup.find_all('iframe')  

for iframe in iframes:
    scheme, netloc, path, params, query, fragment = urlparse(iframe.attrs['data-lazy-src'])
    if not scheme:
        scheme = 'http' # default scheme you used when getting the current page
    url = urlunparse((scheme, netloc, path, params, query, fragment))
    print('Fetching {}'.format(url))
    f = urllib2.urlopen(url)

结果:

代码语言:javascript
复制
Fetching https://www.youtube.com/embed/OWr5FawT2Ks?rel=0
Fetching https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/308112514&color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false
Fetching http://www.facebook.com/plugins/likebox.php?href=https%3A%2F%2Fwww.facebook.com%2FPauseMusicale&width=300&height=62&show_faces=false&colorscheme=light&stream=false&show_border=false&header=false
Fetching http://www.youtube.com/embed/videoseries?list=PLNKCTdT9YSESoQnj5tPP4P9kaIwBCx7F1
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/42662948

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档