首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用selenium对雅虎财经进行Web抓取

使用selenium对雅虎财经进行Web抓取
EN

Stack Overflow用户
提问于 2020-09-30 03:01:30
回答 1查看 166关注 0票数 0

这快把我逼疯了,但我真的找不到解决问题的办法。我用python和selenium编写了一些代码,scap是来自戴姆勒的雅虎财经新闻。但它根本不起作用。我总是在pycharm中得到这样的信息:

代码语言:javascript
复制
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
  (Session info: chrome=85.0.4183.121)

但我非常确定所选择的选择器是唯一合适的选择器。下面是我的代码:

代码语言:javascript
复制
from selenium import webdriver
import pandas as pd


url = 'https://finance.yahoo.com/quote/DAI.DE?p=DAI.DE&.tsrc=fin-srch'


driver = webdriver.Chrome('C:/Users/Startklar/Desktop/CFDS/chromedriver.exe')
driver.get(url)


driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")


articles = driver.find_elements_by_class_name('js-stream-content Pos(r)')


for article in articles:
    source = article.find_element_by_xpath('.//*[@id="quoteNewsStream-0-Stream"]/ul/li[3]/div/div/div[2]/div/span[1]').text
    title = article.find_element_by_xpath('//*[@id="quoteNewsStream-0-Stream"]/ul/li[5]/div/div/div[2]/h3/a').text
    text = article.find_element_by_xpath('//*[@id="quoteNewsStream-0-Stream"]/ul/li[5]/div/div/div[2]/p').text
    date = article.find_element_by_xpath('.//*[@id="quoteNewsStream-0-Stream"]/ul/li[5]/div/div/div[1]/div/span[2]').text

    print(source,title,text,date)

怎么了。真的很感谢你的帮助!

非常感谢

也许查看整个错误消息是有用的:

代码语言:javascript
复制
Traceback (most recent call last):
  File "C:/Users/Startklar/PycharmProjects/test/venv/Selenium Test.py", line 15, in <module>
    articles = driver.find_elements_by_css_selector('li.js-stream-content Pos(r)')
  File "C:\Users\Startklar\PycharmProjects\test\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 614, in find_elements_by_css_selector
    return self.find_elements(by=By.CSS_SELECTOR, value=css_selector)
  File "C:\Users\Startklar\PycharmProjects\test\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1007, in find_elements
    'value': value})['value'] or []
  File "C:\Users\Startklar\PycharmProjects\test\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\Startklar\PycharmProjects\test\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
  (Session info: chrome=85.0.4183.121)

顺便说一句,这是最新的代码

代码语言:javascript
复制
from selenium import webdriver
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


url = 'https://finance.yahoo.com/quote/DAI.DE?p=DAI.DE&.tsrc=fin-srch'


driver = webdriver.Chrome('C:/Users/Startklar/Desktop/CFDS/chromedriver.exe')
driver.get(url)


driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")


articles = WebDriverWait(driver, 100).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "li.js-stream-content")))


for article in articles:
    try:
         source = article.find_element_by_xpath('//div/div/div[2]/div/span[1]').text
         title = article.find_element_by_xpath('//div/div/div[2]/h3/a').text
         text = article.find_element_by_xpath('//div/div/div[2]/p').text
         date = article.find_element_by_xpath('//div/div/div[2]/div/span[2]').text
         print(source,title,text,date+'/n')
    except:
        print("")
EN

回答 1

Stack Overflow用户

发布于 2020-09-30 03:33:38

您的xpath和文章选择器已关闭。

代码语言:javascript
复制
articles = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "li.js-stream-content")))

for article in articles:
    try:
        source = article.find_element_by_xpath('//div/div/div[2]/div/span[1]').text
        title = article.find_element_by_xpath('//div/div/div[2]/h3/a').text
        text = article.find_element_by_xpath('//div/div/div[2]/p').text
        date = article.find_element_by_xpath('//div/div/div[1]/div/span[2]').text
        print(source,title,text,date+'/n')
    except:
        print("")

导入

代码语言:javascript
复制
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64125992

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档