首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用请求和xpath抓取amazon

使用请求和xpath抓取amazon
EN

Stack Overflow用户
提问于 2021-07-09 23:01:37
回答 1查看 117关注 0票数 0

当你进入亚马逊时,有多个卖家可以用不同的价格,我可以刮页面上显示的价格,但不是其他卖家的价格。下面的购买现在和添加列表有一个按钮说“新的(X)从”,如果你点击所有其他卖家显示,我想刮他们的价格,但当我输入他们的价格XPath它给我一个错误

代码语言:javascript
复制
from requests_html import HTMLSession

url = 'https://www.amazon.co.uk/Panini-Sticker-Collection- 
x50Packs/dp/B08V8CF748? 
ref_=Oct_DLandingS_D_7a870443_60&smid=A3P5ROKL5A1OLE'


def GetPrice(URL):
    s = HTMLSession()
    r = s.get(url)

    product = {
       'price':r.html.xpath('//*[@id="aod-price-1"]/span/span[2]' )

       }

   print(product)
   return product




GetPrice('https://www.amazon.co.uk/Colgate-Fresh-Cooling-Crystals-Toothpaste/dp/B073V1MB17/ref=sr_1_5_mod_primary_new?dchild=1&keywords=Toothpaste&qid=1625698678&rdc=1&sbo=RZvfv%2F%2FHxDF%2BO5021pAnSA%3D%3D&sr=8-5')
EN

回答 1

Stack Overflow用户

发布于 2021-07-10 01:35:40

要解决此问题,请尝试使用浏览器开发人员工具,并检查当任何事件被触发时请求是如何加载的,然后尝试通过代码复制相同的行为。

代码

代码语言:javascript
复制
import requests
from lxml import html

headers = {
    'authority': 'www.amazon.co.uk',
    'pragma': 'no-cache',
    'cache-control': 'no-cache',
    'sec-ch-ua': '" Not;A Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
    'rtt': '100',
    'sec-ch-ua-mobile': '?0',
    'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0',
    'accept': 'text/html,*/*',
    'x-requested-with': 'XMLHttpRequest',
    'downlink': '8.4',
    'ect': '4g',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
    'referer': 'https://www.amazon.co.uk/Colgate-Fresh-Cooling-Crystals-Toothpaste/dp/B073V1MB17/ref=sr_1_5_mod_primary_new?dchild=1&keywords=Toothpaste&qid=1625698678&rdc=1&sbo=RZvfv%2F%2FHxDF%2BO5021pAnSA%3D%3D&sr=8-5',
    'accept-language': 'en-US,en;q=0.9',
    'cookie': 'session-id=260-4106472-8409244; i18n-prefs=GBP; ubid-acbuk=260-0481762-4830301; session-token="jc9/khgoELjvvnVyfyUE0zuV+IqwaxQgEelGbV4ihI0VtbHOyZRfQgTpdo7j85y9QuCH+19fCvnLDgNhdjtSrMCWh4U1Pct/A53U0ylVSUCMLNa4HHZqV6q/VBo8EIf0KSIkY47ClNUgwWLkZxzHkm5GWvqvqYBBl7wXIR9zKxY9x0WhN1KrWagXd8Ud062lFMG+ThXyKi0JTHk2K14qmEbPRjE2tmDCZbANgBgXvq4GAXYK/qamSGtiwHIL88aOcKL+4xjmV0o="; csm-hit=adb:adblk_yes&t:1625843678136&tb:s-HPFR45XZN4E5FMK5EH5M|1625843675264; session-id-time=2082758401l',
}

params = (
    ('asin', 'B073V1MB17'),
    ('m', ''),
    ('qid', '1625698678'),
    ('smid', ''),
    ('sourcecustomerorglistid', ''),
    ('sourcecustomerorglistitemid', ''),
    ('sr', '8-5'),
    ('pc', 'dp'),
)
s = requests.Session()

response = s.get('https://www.amazon.co.uk/gp/aod/ajax/ref=dp_aod_NEW_mbc', headers=headers,params=params)

tree = html.fromstring(response.content)

prices = tree.xpath('//span[contains(@class,"a-offscreen")]')

for price in prices[1:]:
    print(price.text)

输出

代码语言:javascript
复制
£1.87
£2.04
£1.44
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68318908

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档