这段代码从amazon抓取产品名称。我想去掉这个变量,它包含HTML的空格,
span = soup.find("span", id="productTitle")
print(span.strip())但它给了我这个错误;
Traceback (most recent call last):
File "C:/Users/avensis/Desktop/Projects/AmazonScraper/Scraper.py", line 17, in <module>
print(span.strip())
TypeError: 'NoneType' object is not callable我不明白为什么会这样。有人能解释一下吗?下面是我的完整代码:
from bs4 import BeautifulSoup
import requests
import html5lib
url = 'https://www.amazon.co.uk/Pingu-PING2573-Mug/dp/B0764468MD/ref=sr_1_11?dchild=1&keywords=pingu&qid=1595849018' \
'&sr=8-11 '
headers = {
"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/84.0.4147.89 Safari/537.36'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html5lib')
span = soup.find("span", id="productTitle")
print(span.strip())发布于 2020-07-27 21:35:42
我猜这就是你想要做的:
from bs4 import BeautifulSoup
import requests
import html5lib
import random
url = 'https://www.amazon.co.uk/Pingu-PING2573-Mug/dp/B0764468MD/ref=sr_1_11?dchild=1&keywords=pingu&qid=1595849018' \
'&sr=8-11 '
headers = {
"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
'Chrome/84.0.4147.89 Safari/537.36'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.content, 'html5lib')
span = soup.find("span", id="productTitle")
print(span.get_text(strip=True))打印:
Pingu - Mug | 300 ml | Ceramic | Gift Box | 11 x 8.5 x 8.5 cm如果这就是你要找的,那就是你错过的.get_text(strip=True)
发布于 2020-07-27 21:33:26
使用.get_text()方法:
span.get_text().replace("\n", "")
'Pingu - Mug | 300 ml | Ceramic | Gift Box | 11 x 8.5 x 8.5 cm'https://stackoverflow.com/questions/63116410
复制相似问题