文章/答案/技术大牛

发布

社区首页 >问答首页 >试图创建一个简单的python web爬虫

问试图创建一个简单的python web爬虫
EN

Stack Overflow用户

提问于 2016-10-31 01:02:19

回答 1查看 76关注 0票数 0

我已经决定学习python2.7编码的数据分析，并一直在youtube上观看许多教程，以了解基本知识。

我正处于这样的阶段，我想为了教育目的而创建简单的网络爬虫，只是为了学习不同的技术，并且只是习惯了一些编码。

我遵循一个网站爬虫教程，但我不确定一些事情。到目前为止，这就是我所拥有的：

import requests
from bs4 import BeautifulSoup
url = 'http://www.aflcio.org/Legislation-and-Politics/Legislative-Alerts'
r = requests.get(url)
plain_text = r.text
soup = BeautifulSoup(plain_text, 'html.parser')
statements = soup.findAll('div','ec_statements')

for link in statements:
    print (link.contents)

我似乎无法使href链接分开，并显示文本和日期信息。

我想让它看起来像这样

物品名称
链接到条款
条款日期

有人能提供一些关于为什么采取这些步骤的信息吗？

非常感谢！

python

web-crawler

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-10-31 02:38:21

一个帮助you.In bs4的小代码，所有节点都是连接的，你都读了一个“链接”节点(实际上是一个div)，你想要得到他的孩子就像标签a，所以link.a是可以的。

然后，节点有两个部分的值，一个是属性、a['href']访问和a.text访问内容。

for link in statements:
    print(link.a['href'])

ps:这是链接变量：

<div id="legalert_title"><a href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-Representatives-opposing-the-Fairness-in-Class-Action-Litigation-and-Furthering-Asbestos-Claim-Transparency-Act">Letter to Representatives opposing the "Fairness in Class Action Litigation and Furthering Asbestos Claim Transparency Act"</a></div>

这是链接。A：

<a href="/Legislation-and-Politics/Legislative-Alerts/Letter-to-Representatives-opposing-the-Fairness-in-Class-Action-Litigation-and-Furthering-Asbestos-Claim-Transparency-Act">Letter to Representatives opposing the "Fairness in Class Action Litigation and Furthering Asbestos Claim Transparency Act"</a>

这是链接，a‘’href‘

/Legislation-and-Politics/Legislative-Alerts/Letter-to-Representatives-opposing-the-Fairness-in-Class-Action-Litigation-and-Furthering-Asbestos-Claim-Transparency-Act

这是.text：

Letter to Representatives opposing the "Fairness in Class Action Litigation and Furthering Asbestos Claim Transparency Act"

所有的html都是这样的，也许你需要学习一些html。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/40335301

复制

相似问题

问试图创建一个简单的python web爬虫
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问试图创建一个简单的python web爬虫EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问试图创建一个简单的python web爬虫
EN