我是python的新手。我刚开始学习网络抓取,我决定为列出的产品名称做亚马逊的网络抓取。因此,我启动了chrome dev工具,并单击了amazon产品名称上的inspect,然后记录了这个类,在本例中,这个类的名称是'a-link-normal‘。问题是我得到的结果是无。这是代码-
import webbrowser
import requests
from bs4 import BeautifulSoup
source = requests.get('https://www.amazon.in/s?k=books&ref=nb_sb_noss')
soup = BeautifulSoup(source.text, 'lxml')
name = soup.find('a', class_ = 'a-link-normal')
print(name)这是我检查的屏幕截图-

我刚接触网络抓取,网站的复杂性让我不知所措,所以如果你愿意,请给我任何建议
感谢您的帮助,谢谢
发布于 2020-08-29 15:07:27
亚马逊似乎阻止了任何爬行,我检查了一下,当你第一次运行代码时,内容可以被提取出来。每当代码第二次运行时,它都会被阻塞。如果您打印出soup变量,您将面临以下通知:
要讨论亚马逊数据的自动访问,请联系api-services-support@amazon.com.有关迁移到我们的API的信息,请参阅https://developer.amazonservices.in/ref=rm_c_sv上的Marketplace API,或参阅https://affiliate-program.amazon.in/gp/advertising/api/detai /main.html/ref=rm_c_ac上的产品广告API,了解广告使用案例。抱歉,我们只需要确认你不是机器人。为获得最佳效果,请确保您的浏览器接受cookies。
我建议你使用,并考虑到代码中的一些延迟,以便表现得像人与人的交互。
但是,尝试在几分钟内运行以下代码,您可以提取图书的标题:
import requests
from bs4 import BeautifulSoup
source = requests.get('https://www.amazon.in/s?k=books&ref=nb_sb_noss')
soup = BeautifulSoup(source.content, 'html.parser')
#print(soup)
names = soup.find_all('span', class_="a-size-medium a-color-base a-text-normal")
for name in names:
print(name.text)发布于 2020-08-29 21:08:45
要从亚马逊服务器获得正确的响应,请使用User-Agent HTTP header:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:80.0) Gecko/20100101 Firefox/80.0'}
source = requests.get('https://www.amazon.in/s?k=books&ref=nb_sb_noss', headers=headers)
soup = BeautifulSoup(source.text, 'lxml')
for a in soup.select('a.a-link-normal > span.a-size-medium'):
print(a.get_text(strip=True))打印:
The Power of Your Subconscious Mind (DELUXE HARDBOUND EDITION)
World’s Greatest Books For Personal Growth & Wealth (Set of 4 Books): Perfect Motivational Gift Set
Ikigai: The Japanese secret to a long and happy life
Attitude Is Everything: Change Your Attitude ... Change Your Life!
World’s Greatest Books For Personal Growth & Wealth (Set of 4 Books): Perfect Motivational Gift Set
The Theory of Everything
The Subtle Art of Not Giving a F*ck
The Alchemist
The Monk Who Sold His Ferrari
The Rudest Book Ever
As a Man Thinketh
How to Stop Worrying and Start Living: Time-Tested Methods for Conquering Worry
Help Hungry Henry Deal with Anger : An Interactive Picture Book About Anger Management
The Girl in Room 105
The Blue Umbrella
Wings of Fire: An Autobiography of Abdul Kalam
My First Library: Boxset of 10 Board Books for Kids
Who Will Cry When You Die?
Rich Dad Poor Dad : What The Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!
Rough Book
The Leader Who Had No Title
The Power Of Influencehttps://stackoverflow.com/questions/63644025
复制相似问题