首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python抓取: soup.select的问题

Python抓取: soup.select的问题
EN

Stack Overflow用户
提问于 2022-09-13 02:32:43
回答 1查看 69关注 0票数 0

我正在开发一个python脚本,用于从特定站点( https://finance.yahoo.com/quote/AUDUSD%3DX/history?p=AUDUSD%3DX )中刮取数据

我在用BeautifulSoup。这一页上有趣的数据如下:

这次我使用soup.select方法,类名为W(100%) M(0),我的代码如下所示:

代码语言:javascript
复制
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://finance.yahoo.com/quote/AUDUSD%3DX/history?p=AUDUSD%3DX"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
table = soup.select(table:has(-soup-contains("W(100%) M(0)")))
print(table)

这并不能产生我想要的结果。

我也尝试过这样做:

代码语言:javascript
复制
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://finance.yahoo.com/quote/AUDUSD%3DX/history?p=AUDUSD%3DX"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
table = soup.select("W(100%) M(0)")
print(table)

并且存在错误,如下所示

代码语言:javascript
复制
Traceback (most recent call last):
  File "/Users/ryanngan/PycharmProjects/Webscraping/seek.py", line 8, in <module>
    table = soup.select("W(100%) M(0)")
  File "/Users/ryanngan/PycharmProjects/Webscraping/venv/lib/python3.9/site-packages/bs4/element.py", line 1973, in select
    results = soupsieve.select(selector, self, namespaces, limit, **kwargs)
  File "/Users/ryanngan/PycharmProjects/Webscraping/venv/lib/python3.9/site-packages/soupsieve/__init__.py", line 144, in select
    return compile(select, namespaces, flags, **kwargs).select(tag, limit)
  File "/Users/ryanngan/PycharmProjects/Webscraping/venv/lib/python3.9/site-packages/soupsieve/__init__.py", line 67, in compile
    return cp._cached_css_compile(pattern, ns, cs, flags)
  File "/Users/ryanngan/PycharmProjects/Webscraping/venv/lib/python3.9/site-packages/soupsieve/css_parser.py", line 218, in _cached_css_compile
    CSSParser(
  File "/Users/ryanngan/PycharmProjects/Webscraping/venv/lib/python3.9/site-packages/soupsieve/css_parser.py", line 1159, in process_selectors
    return self.parse_selectors(self.selector_iter(self.pattern), index, flags)
  File "/Users/ryanngan/PycharmProjects/Webscraping/venv/lib/python3.9/site-packages/soupsieve/css_parser.py", line 985, in parse_selectors
    key, m = next(iselector)
  File "/Users/ryanngan/PycharmProjects/Webscraping/venv/lib/python3.9/site-packages/soupsieve/css_parser.py", line 1152, in selector_iter
    raise SelectorSyntaxError(msg, self.pattern, index)
soupsieve.util.SelectorSyntaxError: Invalid character '(' position 1
  line 1:
W(100%) M(0)

如何使用soup.select方法刮取上述数据?非常感谢。

EN

回答 1

Stack Overflow用户

发布于 2022-09-13 05:45:32

使用直接类选择器(例如.W(100%))中断,因为它是无效的CSS选择器语法。

但是,您可以使用通过attribute*=partial表示的包含语法来解决这个问题。

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
}
response = requests.get(
    "https://finance.yahoo.com/quote/AUDUSD%3DX/history?p=AUDUSD%3DX",
    headers=headers
)
# select any element where class contains "W(100%)" and class contains "M(0)":
soup = BeautifulSoup(response.text)
table = soup.select('[class*="W(100%)"][class*="M(0)"]')
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73696975

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档