文章/答案/技术大牛

发布

社区首页 >问答首页 >美丽的汤在一个网站上返回空列表，但在另一个网站上有效

问美丽的汤在一个网站上返回空列表，但在另一个网站上有效
EN

Stack Overflow用户

提问于 2019-06-24 01:30:15

回答 2查看 957关注 0票数 0

我目前正在通过“用Python自动化无聊的东西”来学习Python。我现在正在做Web抓取部分。

我写的代码可以从一个网站上获取产品的价格。然而，当我稍微修改我的代码以在另一个网站上工作时，它似乎不起作用，并且Beautiful Soup从CSS返回一个空列表。

这是我的工作代码。

import bs4, requests, re

def getPrice(productUrl):
    res = requests.get(productUrl)
    res.raise_for_status()

    soup = bs4.BeautifulSoup(res.text, 'html.parser')

    # Go through CSS and get price
    source = soup.select('#product_addtocart_form > div.product-shop > div.details-info')
    element = source[0].text.strip()
    # Regex for getting the price from the rest of the CSS.
    pattern = re.compile(r"""R([1-9]\d*)(\.\d\d)?(?![\d.])""")

    # Get price from string using regex pattern
    trueprice = re.split(pattern, element)
    return("The product's price is : R " + trueprice[1])

product = "https://www.faithful-to-nature.co.za/green-home-paper-straws-in-compostable-bag"

weblink = getPrice(product)

print(weblink)

这是我为另一个不能工作的网站编辑的代码。我注释掉了一些代码，因为当列表中没有数据时，它没有任何功能。

import bs4, requests, re

def getPrice(productUrl):
    res = requests.get(productUrl)
    res.raise_for_status() # Check for any errors in request

    soup = bs4.BeautifulSoup(res.text, 'html.parser')

    # Go through CSS and get price
    csssource = soup.select('#shopfront-app > div > div.grid-container.pdp-grid-container > div.grid-x.grid-margin-x > div > div > div > div > div.cell.medium-auto > div.pdp-core-module_actions_mdYzm > div.sf-buybox.pdp-core-module_buybox_q5wLs.buybox-module_buybox_eWK2S')
    #element = csssource[0].text.strip()

    # Regex for getting the price from the rest of the CSS.
    pattern = re.compile(r"""R([1-9]\d*)(\.\d\d)?(?![\d.])""")

    #trueprice = re.split(pattern, element)
    #return("The product's price is : R " + trueprice[1])

    print(csssource)

test1 = "https://www.takealot.com/lego-classic-basic-brick-set-11002/PLID53430493"


weblink = getPrice(test1)

print(weblink)

对于这两个站点，我在Chrome上使用inspect方法获得了CSS选择器。我尝试使用更广泛的CSS选择器，但是Beautiful Soup仍然返回一个空列表。

如何让Beautiful Soup返回正确的list / CSS-selector？

beautifulsoup

python

web-scraping

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-06-24 01:52:21

嗨，我相信这个网站是提供动态内容的，所以你需要使用selenium，当我尝试抓取请求/b时，我也得到了空白列表。你也许可以使用你原来的css选择标准，但我选择了货币出现的第5次，因为你试图得到的价格。

下载正确的gecko驱动程序并在脚本中设置路径。

https://github.com/mozilla/geckodriver/releases

from bs4 import BeautifulSoup
from selenium import webdriver
import time

#self.driver = webdriver.Firefox(executable_path = 'D:\Selenium_RiponAlWasim\geckodriver-v0.18.0-win64\geckodriver.exe')

driver = webdriver.Firefox()
driver.get('https://www.takealot.com/lego-classic-basic-brick-set-11002/PLID53430493')
html = driver.page_source
soup = BeautifulSoup(html,'lxml')
i = 0
for span in soup.find_all('span',{'class' : 'currency'}):
    if(i == 4):
        print(span.text)
    i += 1
#driver.close()
#returns R 315

票数 2

Stack Overflow用户

发布于 2019-06-24 02:08:38

如果您查看浏览器中正在发生的请求，您会注意到该站点通过调用https://api.takealot.com/rest/v-1-8-0/product-details/{PRODUCT_ID}?platform=desktop (例如https://api.takealot.com/rest/v-1-8-0/product-details/PLID53430493?platform=desktop)通过JSON获取其产品详细信息。

因此，与使用selenium相比，使用该站点的另一种选择是自己调用API。

import requests

def getProductInfo(productId):
    productUrl = 'https://api.takealot.com/rest/v-1-8-0/product-details/{0}?platform=desktop'.format(productId)
    res = requests.get(productUrl, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'})
    res.raise_for_status() # Check for any errors in request
    return res.json()

product = getProductInfo("PLID53430493")
print(product['buybox']['pretty_price'])

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/56726286

复制

相似问题

问美丽的汤在一个网站上返回空列表，但在另一个网站上有效
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问美丽的汤在一个网站上返回空列表，但在另一个网站上有效EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问美丽的汤在一个网站上返回空列表，但在另一个网站上有效
EN