首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >是否可以使用< bs4 =‘bs4:none;>来解析标签?

是否可以使用< bs4 =‘bs4:none;>来解析标签?
EN

Stack Overflow用户
提问于 2018-08-27 11:01:39
回答 1查看 100关注 0票数 0

如果你浏览这个页面https://weathernews.jp/s/topics/201808/220015/?fm=tp_index,你会看到两张图片,当我把它解析成代码的时候:

代码语言:javascript
复制
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from urllib.parse import urljoin
import re

options = Options()
options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=options)
driver.get('https://weathernews.jp/s/topics/201808/220015/?fm=tp_index')
soup_level2 = BeautifulSoup(driver.page_source, 'lxml')

sections = soup_level2.find_all("img")

for section in sections:
    image = re.findall(r"(https://smtgvs.weathernews.jp/s/topics/img/[0-9]+/.+)\?[0-9]+", urljoin('https://weathernews.jp/', section['src']))

    if image:
        print(image[0])
    else:
        image = re.findall(r"(https://smtgvs.weathernews.jp/s/topics/img/[0-9]+/.+)\?[0-9]+", urljoin('https://weathernews.jp/', section.get("data-original")))
        if image:
            print(image[0])

我得到的图片如下

代码语言:javascript
复制
https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_top_img_A.jpg
https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_box_img0_A.jpg
https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_box_img1_A.jpg
https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_box_img2_A.jpg
https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_box_img5_A.png

事实上,页面上还有另外两张带有style="display: none;"的图片,你能帮我解析一下吗?

代码语言:javascript
复制
<section id="box3" class="nodisp_zero" style="display: none;">
    <h1 id="box_ttl3" style="display: none;"></h1>
    <img style="width: 100%; display: none;" id="box_img3" alt="box3" src="https://smtgvs.weathernews.jp/s/topics/img/dummy.png" class="lazy" data-original="https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_box_img3_A.jpg?1533975785">
    <figcaption id="box_caption3" style="display: none;"></figcaption>
    <div class="textarea clearfix">
        <h2 id="box_subttl3" style="display: none;"></h2>
        <div class="fontL" id="box_com3" style="display: none;"></div>
    </div>
</section>
EN

回答 1

Stack Overflow用户

发布于 2018-08-27 13:40:59

您可以使用属性来查询html。

Ex:

代码语言:javascript
复制
html = """<section id="box3" class="nodisp_zero" style="display: none;">
    <h1 id="box_ttl3" style="display: none;"></h1>
    <img style="width: 100%; display: none;" id="box_img3" alt="box3" src="https://smtgvs.weathernews.jp/s/topics/img/dummy.png" class="lazy" data-original="https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_box_img3_A.jpg?1533975785">
    <figcaption id="box_caption3" style="display: none;"></figcaption>
    <div class="textarea clearfix">
        <h2 id="box_subttl3" style="display: none;"></h2>
        <div class="fontL" id="box_com3" style="display: none;"></div>
    </div>
</section>"""


from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")

print( soup.find("section", {"style": "display: none;"}).img["data-original"] )

输出:

代码语言:javascript
复制
https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_box_img3_A.jpg?1533975785
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/52032024

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档