文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用Selenium从HTML中收集特定数据

问如何使用Selenium从HTML中收集特定数据
EN

Stack Overflow用户

提问于 2021-06-10 00:03:53

回答 1查看 138关注 0票数 0

我正试图通过抓取网页来创建天气预报。(我的前question )

我的代码：

import time
import requests
from selenium import webdriver
from bs4 import BeautifulSoup
from keyboard import press_and_release



def weather_forecast2():
    print('Hello, I can search up the weather for you.')
    while True:
        inp = input('Where shall I search? Enter a place :').capitalize()
        print('Alright, checking the weather in ' + inp + '...')

        URL = 'https://www.yr.no/nb'

        "Search for a place"
        driver = webdriver.Edge()  # Open Microsoft Edge
        driver.get(URL)  # Goes to the HTML-page of the given URL
        element = driver.find_element_by_id("søk")  # Find the search input box
        element.send_keys(inp)  # Enter input
        press_and_release('enter')  # Click enter

        cURL = driver.current_url  # Current URL

        "Find data"
        driver.get(cURL)  # Goes to the HTML-page that appeared after clicking button
        r = requests.get(cURL)  # Get request for contents of the page
        print(r.content)  # Outputs HTML code for the page
        soup = BeautifulSoup(r.content, 'html5lib')  # Parse the data with BeautifulSoup(HTML-string, HTML-parser)

我想从纸上收集温度。我知道我要寻找的元素的xpath是

//@id="dailyWeatherListItem0"/div2/div/span2/span/text() //@id="dailyWeatherListItem0"/div2/div1/span2/span3/text() //@id="dailyWeatherListItem1"/div2/div/span2/span/text() //@id="dailyWeatherListItem1"/div2/div1/span2/span3/text() //@id="dailyWeatherListItem2"/div2/div/span2/span/text() /

/@id="dailyWeatherListItem2"/div2/div1/span2/span3/text() //@id="dailyWeatherListItem3"/div2/div/span2/span/text() //@id="dailyWeatherListItem3"/div2/div1/span2/span3/text()

//等等..。

基本上，我想收集以下两个元素九次：

//@id="dailyWeatherListItem{NUMBERS0-8}"/div2/div/span2/span/text() //@id="dailyWeatherListItem{NUMBER0-8}"/div2/div1/span2/span3/text()

我如何使用driver.find_element_by_xpath来做到这一点？还是有更有效的功能？

web-scraping

beautifulsoup

python

html

selenium-webdriver

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-06-10 01:56:58

假设您可以正确检索url，那么您可以使用该引用头以及该url中的位置id来调用实际返回预测的API。我没有您对press_and_release的定义，所以不需要对代码进行测试。

import requests, re
from selenium import webdriver

# url = 'https://www.yr.no/nb/v%C3%A6rvarsel/daglig-tabell/2-6058560/Canada/Ontario/London'

def get_forecast(str:url)->object:
    
    location_id = re.search(r'daglig-tabell/(.*?)/', url).group(1)
    headers = {'user-agent': 'Mozilla/5.0', 'referer': url}
    forecasts = requests.get(f'https://www.yr.no/api/v0/locations/{location_id}/forecast', headers=headers).json()
    return forecasts 


def get_forecast_url():
    
    print('Hello, I can search up the weather for you.')

    driver = webdriver.Chrome()  # Open Microsoft Edge. (I changed to Chrome)

    while True:

        inp = input('Where shall I search? Enter a place :').capitalize()
        print('Alright, checking the weather in ' + inp + '...')

        URL = 'https://www.yr.no/nb'

        "Search for a place"

        driver.get(URL)  # Goes to the HTML-page of the given URL
        driver.find_element_by_id("page-header__search-button").click() #open search 
        # Find the search input box
        element = driver.find_element_by_id("page-header__search-input")
        element.send_keys(inp)  # Enter input
        press_and_release('enter')  # Click enter

        cURL = driver.current_url  # Current URL
        print(get_forecast(cURL))

    driver.quit()

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67913046

复制

相似问题

问如何使用Selenium从HTML中收集特定数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用Selenium从HTML中收集特定数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用Selenium从HTML中收集特定数据
EN