我只是试着抓取一个网站,以获得标题和产品描述等只是为了练习,我已经抓取了产品名称,但我困惑如何抓取以下东西。
在这里,我只是想获取产品名称和它的描述。我已经成功地拿到了头衔。
from requests_html import HTML,HTMLSession
session = HTMLSession()
r = session.get('https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card')
containers = r.html.find('.item-container',first=True)
#print(containers.html)
title = containers.find('.item-branding img',first=True).attrs['title']
#print(title)
description = containers.find('.item-title',first=True).html
print(description)但是问题出在description中,我想要获取i中这个a标记中的数据,它显示了我不能做的产品的描述,所以如果有任何帮助,我将不胜感激
从这个开始:
<a class="item-title" href="https://www.newegg.com/evga-geforce-rtx-2080-ti-11g-p4-2281-kr/p/N82E16814487418?Item=N82E16814487418" title="View Details"><i class="icon-premier icon-premier-xsm"/>EVGA GeForce RTX 2080 Ti DirectX 12 11G-P4-2281-KR BLACK EDITION GAMING Video Card, Dual HDB Fans & RGB LED</a>我想要抓住这个:
EVGA GeForce RTX 2080 Ti DirectX 12 11G-P4-2281-KR BLACK EDITION GAMING Video Card, Dual HDB Fans & RGB LED发布于 2019-06-02 23:34:06
我推荐使用BeautifulSoup来这个网站的内容,你的代码应该是这样的:
from requests_html import HTML, HTMLSession
from bs4 import BeautifulSoup
session = HTMLSession()
r = session.get('https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card')
soup = BeautifulSoup(r.content,"lxml")
containers = soup.find("div", {"class","item-container"})
title = containers.findAll("img", {"class":"lazy-img"})[1]["title"]
description = containers.find("a",{"class":"item-title"}).getText()
print(description)希望这能帮到你1:https://www.crummy.com/software/BeautifulSoup/bs4/doc/
https://stackoverflow.com/questions/56413773
复制相似问题