首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >选择类名-时无效字符

选择类名-时无效字符
EN

Stack Overflow用户
提问于 2021-06-15 23:01:47
回答 2查看 118关注 0票数 1

我开始学习使用Python进行but抓取的基础知识,但是我的代码有点麻烦。我正试着从“yahoo.com”的头版上浏览天气信息:

代码语言:javascript
复制
<div class="Ai(c) D(f) Jc(sb) Fz(13px) Py(0) Px(0)">

    <div class="D(f) Ai(c) Fld(c)">
        <span class="Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)">Today</span>
        <i class="D(b) Bgr(nr) Bgz(ct) Bgp(c) Mb(10px) H(40px) W(40px) wafer-img-loaded" style="background-image: url(&quot;https://s.yimg.com/cv/apiv2/200510/w/l/fair_day.png&quot;);"></i>
    
    <div class="Fw(600) Fz(12px)">
        <span class="C($c-fuji-grey-n) Pend(5px) unit_F">74<span>°</span></span>
        <span class="C($c-fuji-grey-o) unit_F">59<span>°</span></span></div></div>
    
    <div class="D(f) Ai(c) Fld(c)">
        <span class="Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)">Wed</span>
        <i class="D(b) Bgr(nr) Bgz(ct) Bgp(c) Mb(10px) H(40px) W(40px) wafer-img-loaded" style="background-image: url(&quot;https://s.yimg.com/cv/apiv2/200510/w/l/partly_cloudy_day.png&quot;);"></i>
        <span class="Hidden">Partly cloudy today with a high of 74 °F (23.3 °C) and a low of 51 °F (10.6 °C).</span>
        
        <div class="Fw(600) Fz(12px)">
            <span class="C($c-fuji-grey-n) Pend(5px) unit_F">74<span>°</span></span>
            <span class="C($c-fuji-grey-o) unit_F">51<span>°</span></span></div></div>
    
    <div class="D(f) Ai(c) Fld(c)"><span class="Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)">Thu</span>
        <i class="D(b) Bgr(nr) Bgz(ct) Bgp(c) Mb(10px) H(40px) W(40px) wafer-img-loaded" style="background-image: url(&quot;https://s.yimg.com/cv/apiv2/200510/w/l/partly_cloudy_day.png&quot;);"></i>
        <span class="Hidden">Partly cloudy today with a high of 84 °F (28.9 °C) and a low of 51 °F (10.6 °C).</span>
        
        <div class="Fw(600) Fz(12px)">
            <span class="C($c-fuji-grey-n) Pend(5px) unit_F">84<span>°</span></span>
            <span class="C($c-fuji-grey-o) unit_F">51<span>°</span></span></div></div>
    
    <div class="D(f) Ai(c) Fld(c)">
        <span class="Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)">Fri</span>
        <i class="D(b) Bgr(nr) Bgz(ct) Bgp(c) Mb(10px) H(40px) W(40px) wafer-img-loaded" style="background-image: url(&quot;https://s.yimg.com/cv/apiv2/200510/w/l/scattered_showers_day_night.png&quot;);"></i>
        <span class="Hidden">Scattered thunderstorms today with a high of 84 °F (28.9 °C) and a low of 65 °F (18.3 °C).  There is a 35% chance of precipitation.</span>
        
        <div class="Fw(600) Fz(12px)">
            <span class="C($c-fuji-grey-n) Pend(5px) unit_F">84<span>°</span></span>
            <span class="C($c-fuji-grey-o) unit_F">65<span>°</span></span></div></div></div>

下面是我想出的代码来尝试提取这些信息:

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.yahoo.com/')

soup = BeautifulSoup(r.content, 'html.parser')

weatherTable = soup.select_one("div.Ai(c).D(f).Jc(sb).Fz(13px).Py(0).Px(0)")

for row in weatherTable.select("div.D(f).Ai(c).Fld(c)"):
    day = row.select_one("span.Fw(600).Fz(12px).Mb(10px).C($c-fuji-grey-n).Fz(1em)").text
    dWeather = row.select_one("span.C($c-fuji-grey-n).Pend(5px).unit_F").text
    nWeather = row.select_one("span.C($c-fuji-grey-o).unit_F").text
    print(day, dWeather, nWeather)

当我试图运行我的代码时,我会得到以下错误:

代码语言:javascript
复制
Traceback (most recent call last):
  File "C:\Users\smith\eclipse-workspace\Practice\src\DecodeWeb.py", line 9, in <module>
    weatherTable = soup.select_one("div.Ai(c).D(f).Jc(sb).Fz(13px).Py(0).Px(0)")
  File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\bs4\element.py", line 1834, in select_one
    value = self.select(selector, namespaces, 1, **kwargs)
  File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\bs4\element.py", line 1869, in select
    results = soupsieve.select(selector, self, namespaces, limit, **kwargs)
  File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\__init__.py", line 98, in select
    return compile(select, namespaces, flags, **kwargs).select(tag, limit)
  File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\__init__.py", line 62, in compile
    return cp._cached_css_compile(pattern, namespaces, custom, flags)
  File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\css_parser.py", line 211, in _cached_css_compile
    CSSParser(pattern, custom=custom_selectors, flags=flags).process_selectors(),
  File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\css_parser.py", line 1058, in process_selectors
    return self.parse_selectors(self.selector_iter(self.pattern), index, flags)
  File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\css_parser.py", line 909, in parse_selectors
    key, m = next(iselector)
  File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\css_parser.py", line 1051, in selector_iter
    raise SelectorSyntaxError(msg, self.pattern, index)
soupsieve.util.SelectorSyntaxError: Invalid character '(' position 6
  line 1:
div.Ai(c).D(f).Jc(sb).Fz(13px).Py(0).Px(0)

我是否必须替换特殊字符,以便BS4能够读取类名?

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-06-15 23:23:05

问题是您的CSS选择器包括括号()和美元符号$。这些符号已经有了特殊的含义。请参见:

可以使用反斜杠\转义这些字符。

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.yahoo.com/')

soup = BeautifulSoup(r.content, 'html.parser')
weatherTable = soup.select_one("div.Ai\(c\).D\(f\).Jc\(sb\).Fz\(13px\).Py\(0\).Px\(0\)")

for row in weatherTable.select("div.D\(f\).Ai\(c\).Fld\(c\)"):
    day = row.select_one("span.Fw\(600\).Fz\(12px\).Mb\(10px\).C\(\$c-fuji-grey-n\).Fz\(1em\)").text
    dWeather = row.select_one("span.C\(\$c-fuji-grey-n\).Pend\(5px\).unit_F").text
    nWeather = row.select_one("span.C\(\$c-fuji-grey-o\).unit_F").text
    print(day, dWeather, nWeather)

输出:

代码语言:javascript
复制
Today 82° 67°
Wed 78° 63°
Thu 78° 59°
Fri 81° 62°

替代使用反斜杠转义这些字符的方法是使用[attribute=value]

例如,没有执行以下操作:

代码语言:javascript
复制
day = row.select_one(
        "span.Fw\(600\).Fz\(12px\).Mb\(10px\).C\(\$c-fuji-grey-n\).Fz\(1em\)"
    ).text

你可以:

代码语言:javascript
复制
day = soup.select_one(
    # Find a `span` with the class-name `Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)`
    'span[class="Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)"]'
).text

更容易读懂。

票数 1
EN

Stack Overflow用户

发布于 2021-06-15 23:48:26

虽然已经解释了失败的原因(关于转义某些字符),但这些类是动态的,因此基于这些字符的刮擦很可能很快就会中断。考虑使用更稳定的元素/属性及其关系来执行以下操作:

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.yahoo.com/')
soup = BeautifulSoup(r.content, 'html.parser')
data = [i.get_text() for i in soup.select('.weather-card-content span:not(:nth-child(3n))') if i.text != '°']
print(list(zip(data[0::3],data[1::3],data[2::3])))
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67994434

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档