文章/答案/技术大牛

发布

社区首页 >问答首页 >在我的django应用程序中使用beautifulsoup4，如何获得"a“href和图像src？

问在我的django应用程序中使用beautifulsoup4，如何获得"a“href和图像src？
EN

Stack Overflow用户

提问于 2016-05-10 12:24:07

回答 1查看 269关注 0票数 0

我在我的django应用程序中使用beautifulsoup4来抓取数据。我能够从html结构中获得数据。

   <div class="thumbnail thumb">
        <h6 id="date">May 9, 2016</h6>


        <img src="http://assets.system.jpg" class="img-responsive post">


        <div style="border-bottom: thin solid lightslategray; padding-bottom: 15px;"></div>

        <div class="caption" id="cap">
            <a href="/blog/homeland-security-attack/">
                <h5 class="post-title" id="title">Homeland Security </h5>
            </a>

            <p>
                <a href="/blog/88/delete/" class="btn" role="button">delete</a>
                <a href="/blog/homeland-" class="btn" role="button">edit</a>
            </p>

        </div>
    </div>

在我看来使用这个

url = 'http://www.hispanicheights.com/'
google = requests.get(url)
bs = BeautifulSoup(google.content, 'html.parser')
divs = bs.findAll('div', 'thumbnail')
    entries = [{'text': div.text,
          'href': div.find('a').get('href'),
          'src': div.find('img').get('src')
          } for div in divs][:6]

但是当我试图刮掉这个html结构时

<div class="entry entry-pos-1" id="entry-217985">
        <a href="/article/murder" data-page="1">
            <p class="entry-comments">6</p>
            <img data-original="/images17985.jpg" alt="Chicago Rapper &amp; OTF Aff Murder" width="320" height="179" class="image-load" src="/images/size_mb/video-217985.jpg" style="display: block;">
        </a>
        <p class="entry-title">
            <a href="/article/-murder" data-page="1">Chicago Rapper &amp; OT Murder</a>
        </p>
        <p class="entry-meta">97 views</p>
        <p class="entry-date">
        <span class="entry-recent">11 Mins Ago</span>
        </p>
    </div>

用同样的东西

ad_url = 'http://www.ad.com/'
ad_get = requests.get(ad_url, headers=headers)
ad_soup = BeautifulSoup(ad_get.content, 'html.parser')
ad_div = vlad_soup.findAll('div', 'entry')
ad_entry = [{'text': div.text,
              'href': div.find('a').get('href'),
              'src': div.find('img').get('src')
                 } for div in ad_div]

它得到错误，非类型对象有属性，有属性

获取href和src的正确语法是什么？

python

html

django

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-05-10 12:52:38

如果为不包含锚的div调用div，它将返回None。你的代码必须处理这件事。例如，您可以：

entries = []
for div in vlad_div:
    a = div.find('a')
    img = div.find('img')
    if a is not None and img is not None:
        entry = {
            'text': div.text
            'href': a.get('href')
            'src': img.get('src')
        }
        entries.append(entry)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/37138480

复制

相似问题

问在我的django应用程序中使用beautifulsoup4，如何获得"a“href和图像src？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在我的django应用程序中使用beautifulsoup4，如何获得"a“href和图像src？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在我的django应用程序中使用beautifulsoup4，如何获得"a“href和图像src？
EN