首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何使用Selenium从HTML中收集特定数据

如何使用Selenium从HTML中收集特定数据
EN

Stack Overflow用户
提问于 2021-06-10 00:03:53
回答 1查看 138关注 0票数 0

我正试图通过抓取网页来创建天气预报。(我的前question )

我的代码:

代码语言:javascript
复制
import time
import requests
from selenium import webdriver
from bs4 import BeautifulSoup
from keyboard import press_and_release



def weather_forecast2():
    print('Hello, I can search up the weather for you.')
    while True:
        inp = input('Where shall I search? Enter a place :').capitalize()
        print('Alright, checking the weather in ' + inp + '...')

        URL = 'https://www.yr.no/nb'

        "Search for a place"
        driver = webdriver.Edge()  # Open Microsoft Edge
        driver.get(URL)  # Goes to the HTML-page of the given URL
        element = driver.find_element_by_id("søk")  # Find the search input box
        element.send_keys(inp)  # Enter input
        press_and_release('enter')  # Click enter

        cURL = driver.current_url  # Current URL

        "Find data"
        driver.get(cURL)  # Goes to the HTML-page that appeared after clicking button
        r = requests.get(cURL)  # Get request for contents of the page
        print(r.content)  # Outputs HTML code for the page
        soup = BeautifulSoup(r.content, 'html5lib')  # Parse the data with BeautifulSoup(HTML-string, HTML-parser)

我想从纸上收集温度。我知道我要寻找的元素的xpath是

//@id="dailyWeatherListItem0"/div2/div/span2/span/text() //@id="dailyWeatherListItem0"/div2/div1/span2/span3/text() //@id="dailyWeatherListItem1"/div2/div/span2/span/text() //@id="dailyWeatherListItem1"/div2/div1/span2/span3/text() //@id="dailyWeatherListItem2"/div2/div/span2/span/text() /

/@id="dailyWeatherListItem2"/div2/div1/span2/span3/text() //@id="dailyWeatherListItem3"/div2/div/span2/span/text() //@id="dailyWeatherListItem3"/div2/div1/span2/span3/text()

//等等..。

基本上,我想收集以下两个元素九次:

//@id="dailyWeatherListItem{NUMBERS0-8}"/div2/div/span2/span/text() //@id="dailyWeatherListItem{NUMBER0-8}"/div2/div1/span2/span3/text()

我如何使用driver.find_element_by_xpath来做到这一点?还是有更有效的功能?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-06-10 01:56:58

假设您可以正确检索url,那么您可以使用该引用头以及该url中的位置id来调用实际返回预测的API。我没有您对press_and_release的定义,所以不需要对代码进行测试。

代码语言:javascript
复制
import requests, re
from selenium import webdriver

# url = 'https://www.yr.no/nb/v%C3%A6rvarsel/daglig-tabell/2-6058560/Canada/Ontario/London'

def get_forecast(str:url)->object:
    
    location_id = re.search(r'daglig-tabell/(.*?)/', url).group(1)
    headers = {'user-agent': 'Mozilla/5.0', 'referer': url}
    forecasts = requests.get(f'https://www.yr.no/api/v0/locations/{location_id}/forecast', headers=headers).json()
    return forecasts 


def get_forecast_url():
    
    print('Hello, I can search up the weather for you.')

    driver = webdriver.Chrome()  # Open Microsoft Edge. (I changed to Chrome)

    while True:

        inp = input('Where shall I search? Enter a place :').capitalize()
        print('Alright, checking the weather in ' + inp + '...')

        URL = 'https://www.yr.no/nb'

        "Search for a place"

        driver.get(URL)  # Goes to the HTML-page of the given URL
        driver.find_element_by_id("page-header__search-button").click() #open search 
        # Find the search input box
        element = driver.find_element_by_id("page-header__search-input")
        element.send_keys(inp)  # Enter input
        press_and_release('enter')  # Click enter

        cURL = driver.current_url  # Current URL
        print(get_forecast(cURL))

    driver.quit()
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67913046

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档