首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >python3: bs4在一些网站上有问题

python3: bs4在一些网站上有问题
EN

Stack Overflow用户
提问于 2020-10-04 04:45:42
回答 1查看 68关注 0票数 2

我正在学习python和bs4。

根据一些建议和许多网站,我写了这个脚本:

代码语言:javascript
复制
import requests as rq
from bs4 import BeautifulSoup

header = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'}

def get_price(site):
    html = rq.get(site, headers=header).text
    soup = BeautifulSoup(html, 'html.parser')
    try:
        price = soup.find(id="priceblock_ourprice").get_text()
        print(site)
        print(price)
    except:
        print(site)
        print("failed")

sites = ["https://www.amazon.in/Apple-iPhone-11-64GB-Green/dp/B07XVKBY68/ref=sr_1_7?keywords=iphone+11&qid=1573668357&sr=8-7",
        "https://www.amazon.it/Apple-iPhone-64GB-Verde-Ricondizionato/dp/B082DN72G3/ref=sr_1_19?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-19", 
        "https://www.amazon.it/Apple-iPhone-11-128GB-Verde/dp/B07XS5MSW4/ref=sr_1_1_sspa?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExNlhGMElFNUhJMTBJJmVuY3J5cHRlZElkPUEwMTI2OTMxMVpXWEtHQ1o5S0ZENCZlbmNyeXB0ZWRBZElkPUEwOTMyMTczMVdMMzlQOTRPTUE3SCZ3aWRnZXROYW1lPXNwX2F0ZiZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU=" ]

for site in sites:
    get_price(site)
    print("\n")

我运行它并得到如下结果:

代码语言:javascript
复制
https://www.amazon.in/Apple-iPhone-11-64GB-Green/dp/B07XVKBY68/ref=sr_1_7?keywords=iphone+11&qid=1573668357&sr=8-7
₹ 64,499.00

https://www.amazon.it/Apple-iPhone-64GB-Verde-Ricondizionato/dp/B082DN72G3/ref=sr_1_19?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-19
failed

https://www.amazon.it/Apple-iPhone-11-128GB-Verde/dp/B07XS5MSW4/ref=sr_1_1_sspa?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExNlhGMElFNUhJMTBJJmVuY3J5cHRlZElkPUEwMTI2OTMxMVpXWEtHQ1o5S0ZENCZlbmNyeXB0ZWRBZElkPUEwOTMyMTczMVdMMzlQOTRPTUE3SCZ3aWRnZXROYW1lPXNwX2F0ZiZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU=
749,00 €

我想不出为什么第二个站点不能工作。

字符串priceblock_ourprice存在:

代码语言:javascript
复制
$ wget -q -O - 'https://www.amazon.it/Apple-iPhone-64GB-Verde-Ricondizionato/dp/B082DN72G3/ref=sr_1_19?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-19' 2>&1 | grep \"priceblock_ourprice\"
<span id="priceblock_ourprice" class="a-size-medium a-color-price priceBlockBuyingPriceString">629,00 €</span>
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-10-04 06:32:37

问题是亚马逊提供的超文本标记语言html.parser无法正确解析。解决方案是使用lxmlhtml5lib解析器:

代码语言:javascript
复制
import requests as rq
from bs4 import BeautifulSoup


header = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0'}

def get_price(site):
    html = rq.get(site, headers=header).text
    soup = BeautifulSoup(html, 'lxml')      # <--- use 'lxml' or 'html5lib' parser
    try:
        price = soup.find(id="priceblock_ourprice").get_text()
        print(site)
        print(price)
    except:
        print(site)
        print("failed")

sites = ["https://www.amazon.in/Apple-iPhone-11-64GB-Green/dp/B07XVKBY68/ref=sr_1_7?keywords=iphone+11&qid=1573668357&sr=8-7",
        "https://www.amazon.it/Apple-iPhone-64GB-Verde-Ricondizionato/dp/B082DN72G3/ref=sr_1_19?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-19", 
        "https://www.amazon.it/Apple-iPhone-11-128GB-Verde/dp/B07XS5MSW4/ref=sr_1_1_sspa?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExNlhGMElFNUhJMTBJJmVuY3J5cHRlZElkPUEwMTI2OTMxMVpXWEtHQ1o5S0ZENCZlbmNyeXB0ZWRBZElkPUEwOTMyMTczMVdMMzlQOTRPTUE3SCZ3aWRnZXROYW1lPXNwX2F0ZiZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU=" ]

for site in sites:
    get_price(site)
    print("\n")

打印:

代码语言:javascript
复制
https://www.amazon.in/Apple-iPhone-11-64GB-Green/dp/B07XVKBY68/ref=sr_1_7?keywords=iphone+11&qid=1573668357&sr=8-7
₹ 64,499.00


https://www.amazon.it/Apple-iPhone-64GB-Verde-Ricondizionato/dp/B082DN72G3/ref=sr_1_19?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-19
744,89 €


https://www.amazon.it/Apple-iPhone-11-128GB-Verde/dp/B07XS5MSW4/ref=sr_1_1_sspa?__mk_it_IT=%C3%85M%C3%85%C5%BD%C3%95%C3%91&dchild=1&keywords=iphone+11&qid=1601755114&sr=8-1-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExNlhGMElFNUhJMTBJJmVuY3J5cHRlZElkPUEwMTI2OTMxMVpXWEtHQ1o5S0ZENCZlbmNyeXB0ZWRBZElkPUEwOTMyMTczMVdMMzlQOTRPTUE3SCZ3aWRnZXROYW1lPXNwX2F0ZiZhY3Rpb249Y2xpY2tSZWRpcmVjdCZkb05vdExvZ0NsaWNrPXRydWU=
749,00 €
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64188821

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档