我开始学习使用Python进行but抓取的基础知识,但是我的代码有点麻烦。我正试着从“yahoo.com”的头版上浏览天气信息:

<div class="Ai(c) D(f) Jc(sb) Fz(13px) Py(0) Px(0)">
<div class="D(f) Ai(c) Fld(c)">
<span class="Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)">Today</span>
<i class="D(b) Bgr(nr) Bgz(ct) Bgp(c) Mb(10px) H(40px) W(40px) wafer-img-loaded" style="background-image: url("https://s.yimg.com/cv/apiv2/200510/w/l/fair_day.png");"></i>
<div class="Fw(600) Fz(12px)">
<span class="C($c-fuji-grey-n) Pend(5px) unit_F">74<span>°</span></span>
<span class="C($c-fuji-grey-o) unit_F">59<span>°</span></span></div></div>
<div class="D(f) Ai(c) Fld(c)">
<span class="Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)">Wed</span>
<i class="D(b) Bgr(nr) Bgz(ct) Bgp(c) Mb(10px) H(40px) W(40px) wafer-img-loaded" style="background-image: url("https://s.yimg.com/cv/apiv2/200510/w/l/partly_cloudy_day.png");"></i>
<span class="Hidden">Partly cloudy today with a high of 74 °F (23.3 °C) and a low of 51 °F (10.6 °C).</span>
<div class="Fw(600) Fz(12px)">
<span class="C($c-fuji-grey-n) Pend(5px) unit_F">74<span>°</span></span>
<span class="C($c-fuji-grey-o) unit_F">51<span>°</span></span></div></div>
<div class="D(f) Ai(c) Fld(c)"><span class="Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)">Thu</span>
<i class="D(b) Bgr(nr) Bgz(ct) Bgp(c) Mb(10px) H(40px) W(40px) wafer-img-loaded" style="background-image: url("https://s.yimg.com/cv/apiv2/200510/w/l/partly_cloudy_day.png");"></i>
<span class="Hidden">Partly cloudy today with a high of 84 °F (28.9 °C) and a low of 51 °F (10.6 °C).</span>
<div class="Fw(600) Fz(12px)">
<span class="C($c-fuji-grey-n) Pend(5px) unit_F">84<span>°</span></span>
<span class="C($c-fuji-grey-o) unit_F">51<span>°</span></span></div></div>
<div class="D(f) Ai(c) Fld(c)">
<span class="Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)">Fri</span>
<i class="D(b) Bgr(nr) Bgz(ct) Bgp(c) Mb(10px) H(40px) W(40px) wafer-img-loaded" style="background-image: url("https://s.yimg.com/cv/apiv2/200510/w/l/scattered_showers_day_night.png");"></i>
<span class="Hidden">Scattered thunderstorms today with a high of 84 °F (28.9 °C) and a low of 65 °F (18.3 °C). There is a 35% chance of precipitation.</span>
<div class="Fw(600) Fz(12px)">
<span class="C($c-fuji-grey-n) Pend(5px) unit_F">84<span>°</span></span>
<span class="C($c-fuji-grey-o) unit_F">65<span>°</span></span></div></div></div>下面是我想出的代码来尝试提取这些信息:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.yahoo.com/')
soup = BeautifulSoup(r.content, 'html.parser')
weatherTable = soup.select_one("div.Ai(c).D(f).Jc(sb).Fz(13px).Py(0).Px(0)")
for row in weatherTable.select("div.D(f).Ai(c).Fld(c)"):
day = row.select_one("span.Fw(600).Fz(12px).Mb(10px).C($c-fuji-grey-n).Fz(1em)").text
dWeather = row.select_one("span.C($c-fuji-grey-n).Pend(5px).unit_F").text
nWeather = row.select_one("span.C($c-fuji-grey-o).unit_F").text
print(day, dWeather, nWeather)当我试图运行我的代码时,我会得到以下错误:
Traceback (most recent call last):
File "C:\Users\smith\eclipse-workspace\Practice\src\DecodeWeb.py", line 9, in <module>
weatherTable = soup.select_one("div.Ai(c).D(f).Jc(sb).Fz(13px).Py(0).Px(0)")
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\bs4\element.py", line 1834, in select_one
value = self.select(selector, namespaces, 1, **kwargs)
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\bs4\element.py", line 1869, in select
results = soupsieve.select(selector, self, namespaces, limit, **kwargs)
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\__init__.py", line 98, in select
return compile(select, namespaces, flags, **kwargs).select(tag, limit)
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\__init__.py", line 62, in compile
return cp._cached_css_compile(pattern, namespaces, custom, flags)
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\css_parser.py", line 211, in _cached_css_compile
CSSParser(pattern, custom=custom_selectors, flags=flags).process_selectors(),
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\css_parser.py", line 1058, in process_selectors
return self.parse_selectors(self.selector_iter(self.pattern), index, flags)
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\css_parser.py", line 909, in parse_selectors
key, m = next(iselector)
File "C:\Users\smith\AppData\Local\Programs\Python\Python39\lib\soupsieve\css_parser.py", line 1051, in selector_iter
raise SelectorSyntaxError(msg, self.pattern, index)
soupsieve.util.SelectorSyntaxError: Invalid character '(' position 6
line 1:
div.Ai(c).D(f).Jc(sb).Fz(13px).Py(0).Px(0)我是否必须替换特殊字符,以便BS4能够读取类名?
发布于 2021-06-15 23:23:05
问题是您的CSS选择器包括括号()和美元符号$。这些符号已经有了特殊的含义。请参见:
() - CSS选择器中是否允许括号?$ - [attribute$=value]选举人可以使用反斜杠\转义这些字符。
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.yahoo.com/')
soup = BeautifulSoup(r.content, 'html.parser')
weatherTable = soup.select_one("div.Ai\(c\).D\(f\).Jc\(sb\).Fz\(13px\).Py\(0\).Px\(0\)")
for row in weatherTable.select("div.D\(f\).Ai\(c\).Fld\(c\)"):
day = row.select_one("span.Fw\(600\).Fz\(12px\).Mb\(10px\).C\(\$c-fuji-grey-n\).Fz\(1em\)").text
dWeather = row.select_one("span.C\(\$c-fuji-grey-n\).Pend\(5px\).unit_F").text
nWeather = row.select_one("span.C\(\$c-fuji-grey-o\).unit_F").text
print(day, dWeather, nWeather)输出:
Today 82° 67°
Wed 78° 63°
Thu 78° 59°
Fri 81° 62°替代使用反斜杠转义这些字符的方法是使用[attribute=value]。
例如,没有执行以下操作:
day = row.select_one(
"span.Fw\(600\).Fz\(12px\).Mb\(10px\).C\(\$c-fuji-grey-n\).Fz\(1em\)"
).text你可以:
day = soup.select_one(
# Find a `span` with the class-name `Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)`
'span[class="Fw(600) Fz(12px) Mb(10px) C($c-fuji-grey-n) Fz(1em)"]'
).text更容易读懂。
发布于 2021-06-15 23:48:26
虽然已经解释了失败的原因(关于转义某些字符),但这些类是动态的,因此基于这些字符的刮擦很可能很快就会中断。考虑使用更稳定的元素/属性及其关系来执行以下操作:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.yahoo.com/')
soup = BeautifulSoup(r.content, 'html.parser')
data = [i.get_text() for i in soup.select('.weather-card-content span:not(:nth-child(3n))') if i.text != '°']
print(list(zip(data[0::3],data[1::3],data[2::3])))https://stackoverflow.com/questions/67994434
复制相似问题