文章/答案/技术大牛

发布

社区首页 >问答首页 >从HTML-页面中刮取<div<span

问从HTML-页面中刮取<div<span
EN

Stack Overflow用户

提问于 2021-06-09 21:34:53

回答 1查看 87关注 0票数 0

我正在使用Eclipse中的Python创建一个简单的天气预报。到目前为止，我已经这样写了：

from bs4 import BeautifulSoup
import requests


def weather_forecast():
    url = 'https://www.yr.no/nb/v%C3%A6rvarsel/daglig-tabell/1-92416/Norge/Vestland/Bergen/Bergen'
    r = requests.get(url)  # Get request for contents of the page
    print(r.content)  # Outputs HTML code for the page
    soup = BeautifulSoup(r.content, 'html5lib')  # Parse the data with BeautifulSoup(HTML-string, html-parser)
    min_max = soup.select('min-max.temperature')  # Select all spans with a "min-max-temperature" attribute
    print(min_max.prettify())
    table = soup.find('div', attrs={'daily-weather-list-item__temperature'})
    print(table.prettify())

从包含如下元素的html页面中：

我已经在HTML页面的元素中找到了通向第一个温度的路径，但是当我尝试执行我的代码并打印以确定我是否正确地完成了它时，什么都不会被打印出来。我的目标是打印一个包含日期和相应温度的表，这似乎是一项简单的任务，但我不知道如何在一次迭代中正确命名属性或如何从HTML页面中抓取所有属性。

我想进入每个

我看过这个关于堆栈溢出的问题，但我无法理解它：Python BeautifulSoup - Scraping Div Spans and p tags - also how to get exact match on div name

python

html

web-scraping

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-06-10 00:23:49

你可以用字典来理解。循环所有具有daily-weather-list-item类的预测，然后从时间标记的datetime属性中提取日期，并将其用作键；将键与maxmin信息关联。

import requests
from bs4 import BeautifulSoup

def weather_forecast():
    url = 'https://www.yr.no/nb/v%C3%A6rvarsel/daglig-tabell/1-92416/Norge/Vestland/Bergen/Bergen'
    r = requests.get(url)  # Get request for contents of the page
    soup = BeautifulSoup(r.content, 'html5lib')  
    temps = {i.select_one('time')['datetime']:i.select_one('.min-max-temperature').get_text(strip= True) 
             for i in soup.select('.daily-weather-list-item')}
    return temps

weather_forecast()

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67911992

复制

相似问题

问从HTML-页面中刮取<div<span
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从HTML-页面中刮取<div<spanEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从HTML-页面中刮取<div<span
EN