首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >BeautifulSoup和scraping的不起作用

BeautifulSoup和scraping的不起作用
EN

Stack Overflow用户
提问于 2018-09-15 15:34:30
回答 2查看 85关注 0票数 0

再一次,我在BeautifulSoup中刮href有困难。我有一个页面列表,我正在抓取,我有数据,但我似乎无法得到href,即使我使用各种代码,在其他脚本工作。

下面是代码,我的数据如下:

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup


with open('states_names.csv', 'r') as reader:
    states = [states.strip().replace(' ', '-') for states in reader]


url = 'https://www.hauntedplaces.org/state/alabama'

for state in states:
    page = requests.get(url+state)
    soup = BeautifulSoup(page.text, 'html.parser')
    links = soup.findAll('div', class_='description')
    # When I try to add .get('href') I get a traceback error. Am I trying to scrape the href too early? 
    h_page = soup.findAll('h3')

<h3><a href="https://www.hauntedplaces.org/item/gaines-ridge-dinner-club/">Gaines Ridge Dinner Club</a></h3>
<h3><a href="https://www.hauntedplaces.org/item/purifoy-lipscomb-house/">Purifoy-Lipscomb House</a></h3>
<h3><a href="https://www.hauntedplaces.org/item/kate-shepard-house-bed-and-breakfast/">Kate Shepard House Bed and Breakfast</a></h3>
<h3><a href="https://www.hauntedplaces.org/item/cedarhurst-mansion/">Cedarhurst Mansion</a></h3>
<h3><a href="https://www.hauntedplaces.org/item/crybaby-bridge/">Crybaby Bridge</a></h3>
<h3><a href="https://www.hauntedplaces.org/item/gaineswood-plantation/">Gaineswood Plantation</a></h3>
<h3><a href="https://www.hauntedplaces.org/item/mountain-view-hospital/">Mountain View Hospital</a></h3>
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-09-15 17:51:19

这是非常有效的:

代码语言:javascript
复制
from bs4 import BeautifulSoup
import requests

url = 'https://www.hauntedplaces.org/state/Alabama'

r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')

for link in soup.select('div.description a'):
    print(link['href'])
票数 1
EN

Stack Overflow用户

发布于 2018-09-15 15:39:42

试一试:

代码语言:javascript
复制
soup = BeautifulSoup(page.content, 'html.parser')
list0 = []   
possible_links = soup.find_all('a')
for link in possible_links:
    if link.has_attr('href'):
        print (link.attrs['href'])
        list0.append(link.attrs['href'])
print(list0)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/52346180

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档