首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >无头不使用剧作家和BeautifulSoup 4

无头不使用剧作家和BeautifulSoup 4
EN

Stack Overflow用户
提问于 2022-07-06 16:59:39
回答 1查看 924关注 0票数 1

此代码正在工作:

代码语言:javascript
复制
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
from datetime import datetime
import time

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()
    page.goto("https://www.apple.com/br/shop/product/MV7N2BE/A/airpods-com-estojo-de-recarga")
    html = page.content()
    soup = BeautifulSoup(html,'html.parser')
    valorAppleStore = soup.select("span.as-price-installments")[-2].get_text().replace(" à vista (10% de desconto)", '')
    print(valorAppleStore)
    browser.close()

但是,如果我更改了headless=True,代码将返回一个错误:

代码语言:javascript
复制
Traceback (most recent call last):
  File "c:/Users/ANDERSONCARVALHODELI/Documents/py/AirpodsPW.py", line 19, in <module>
    valorAppleStore = soup.select("span.as-price-installments")[-2].get_text().replace(" à vista (10% de desconto)", 
'')
IndexError: list index out of range

我用以下方法修正了这个问题:

代码语言:javascript
复制
from playwright.sync_api import sync_playwright
from bs4 import BeautifulSoup
from datetime import datetime
import time

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()
    page.goto("https://www.apple.com/br/shop/product/MV7N2BE/A/airpods-com-estojo-de-recarga")
    time.sleep(1)
    browser.close()
    
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://www.apple.com/br/shop/product/MV7N2BE/A/airpods-com-estojo-de-recarga")
    html = page.content()
    soup = BeautifulSoup(html,'html.parser')
    valorAppleStore = soup.select("span.as-price-installments")[-2].get_text().replace(" à vista (10% de desconto)", '')
    print(valorAppleStore)

但我认为这不是更好的选择。如何在不使用headless=False打开浏览器并坚持使用headless=True的情况下修复此问题?

当我print(html)soup=...之前,我看到:

代码语言:javascript
复制
    <!DOCTYPE html><html><head> <title>Page Not Found - Apple</title> <link rel="stylesheet" href="https://www.apple.com/wss/fonts?families=SF+Pro,v1|SF+Pro+Icons,v1"> <link rel="stylesheet" href="https://www.apple.com/v/errors/c/built/styles/main.built.css" type="text/css"> <link rel="stylesheet" href="https://www.apple.com/v/errors/c/built/styles/overview.built.css" type="text/css"> <link rel="stylesheet" href="https://store.storeimages.cdn-apple.com/4982/store.apple.com/shop/rs-external/rel/us/external.css"> <link rel="stylesheet" href="https://store.storeimages.cdn-apple.com/4982/store.apple.com/shop/rs-globalelements/dist/us/globalelements.css"> <style>.more::after{content: "";}a.pointer, a.more, a.block span.more, button.unbutton.more{padding-right: .7em; background-image: url(https://store.storeimages.cdn-apple.com/4982/store.apple.com/shop/rs-web/2/dist/assets/as-legacy/base/link/res/more.svg); background-repeat: no-repeat; background-position: 100% 50%; background-size: 5px 9px; zoom: 1;}.as-globalfooter-directory-column-section-list a{margin-bottom: .8em; display: block}.as-globalfooter-directory-column-section-list a:last-child{margin-bottom: 0;}.as-globalfooter-mini .as-globalfooter-mini-shop a{color: #06c;}.as-globalfooter .as-globalfooter-mini-legal-copyright, .as-footnotes .as-globalfooter-mini-legal-copyright, .as-globalfooter .as-globalfooter-mini-legal-link, .as-footnotes .as-globalfooter-mini-legal-link{top: -3px; position: relative; z-index: 1;}.as-globalfooter .as-globalfooter-directory+.as-globalfooter-mini, .as-footnotes .as-globalfooter-directory+.as-globalfooter-mini{padding-bottom: 26px;}.container{position: relative;}hr{display: inline-block; border: 0px; border-top: 0.1em solid #CCD2D9; width: 100%}</style></head><body class="page-overview"> <nav data-store-api="/shop/bag/status" id="ac-globalnav"> <div class="ac-gn-content"> <ul class="ac-gn-list"> <a href="/" class="ac-gn-link ac-gn-link-apple"> <p class="ac-gn-link-text">Apple</p></a> <a href="/us/shop/goto/store" class="ac-gn-link ac-gn-link-store"> <p class="ac-gn-link-text">Store</p></a> <a href="/mac/" class="ac-gn-link ac-gn-link-mac"> <p class="ac-gn-link-text">Mac</p></a> <a href="/ipad/" class="ac-gn-link ac-gn-link-ipad"> <p class="ac-gn-link-text">iPad</p></a> <a href="/iphone/" class="ac-gn-link ac-gn-link-iphone"> <p class="ac-gn-link-text">iPhone</p></a> <a href="/watch/" class="ac-gn-link ac-gn-link-watch"> <p class="ac-gn-link-text">Watch</p></a> <a href="/airpods/" class="ac-gn-link ac-gn-link-airpods"> <p class="ac-gn-link-text">AirPods</p></a> <a href="/tv-home/" class="ac-gn-link ac-gn-link-tvhome"> <p class="ac-gn-link-text">TV &amp; Home</p></a> 
/*
* 提示:该行代码过长,系统自动注释不进行高亮。一键复制会移除系统注释 
* <a href="/services/" class="ac-gn-link ac-gn-link-onlyonapple"> <p class="ac-gn-link-text">Only on Apple</p></a> <a href="/us/shop/goto/buy_accessories" class="ac-gn-link ac-gn-link-accessories"> <p class="ac-gn-link-text">Accessories</p></a> <a href="https://support.apple.com" class="ac-gn-link ac-gn-link-support"> <p class="ac-gn-link-text">Support</p></a> <li class="ac-gn-item ac-gn-item-menu ac-gn-search"> <a id="ac-gn-link-search" class="ac-gn-link ac-gn-link-search" href="/us/search" data-analytics-title="search" data-analytics-intrapage-link="" aria-label="Search apple.com" role="button" aria-haspopup="true"></a> </li><a href="/us/shop/goto/bag" class="ac-gn-link ac-gn-link-bag"> <p class="ac-gn-link-text">Shopping Bag</p></a> </ul> </div></nav> <div id="ac-gn-placeholder"> </div><main id="main" class="main" role="main" data-page-type="overview"> <h1 class="section-headline typography-headline">The page you’re looking for can’t be found.</h1> <aside id="search-wrapper" role="search" data-analytics-region="search" aria-hidden="false"> <form id="searchform-form" class="searchform" action="/us/search" method="get" data-suggestions-url="/search-services/suggestions/"><input id="searchform-input" type="text" class="form-textbox form-textbox-text form-icon-left" aria-labelledby="textbox_label" required="" aria-required="true" data-placeholder-long="Search for Products, Stores, and Help" autocorrect="off" autocapitalize="off" autocomplete="off"><span class="form-label" id="textbox_label" aria-hidden="true">Search apple.com</span> <div id="searchform-submit" class="form-icons-wrapper form-icons-wrapper-left form-icons-focusable" type="submit" aria-label="Submit"><button class="form-icons form-icons-search15"></button></div><div id="searchform-reset" class="button-reset form-icons-wrapper form-icons-focusable" type="reset" disabled="" aria-label="Clear Search"><button class="form-icons form-icons-small form-icons-clearsolid15 form-icon-reset"></button></div></form> </aside> <div class="cta-sitemap"> <div class="cta-sitemap"> <a href="/sitemap/" class="more" style="top: bottom">Or see our site map</a> </div></div></main> <footer class="as-globalfooter as-globalfooter-contained"> <div class="as-globalfooter-content"> <div class="as-globalfooter-breadcrumbs"> <a href="/" class="as-globalfooter-breadcrumbs-home"> <p class="as-globalfooter-breadcrumbs-home-icon"></p><p class="as-globalfooter-breadcrumbs-home-label">Apple</p></a> <div class="as-globalfooter-breadcrumbs-path"> <ol class="as-globalfooter-breadcrumbs-list"> <li class="as-globalfooter-breadcrumbs-item breadcrumbs-title"> Page Not Found</li></ol> </div></div><nav class="as-globalfooter-directory with-5-columns"> <div class="as-globalfooter-directory-column"> <div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">Shop and Learn</h3> <ul class="as-globalfooter-directory-column-section-list"> <a href="/us/shop/goto/store">Store</a> <a href="/mac/">Mac</a> <a href="/ipad/">iPad</a> <a href="/iphone/">iPhone</a> <a href="/watch/">Watch</a> <a href="/airpods/">AirPods</a> <a href="/tv-home/">TV &amp; Home</a> <a href="/ipod-touch/">iPod touch</a> <a href="/airtag/">AirTag</a> <a href="/us/shop/goto/buy_accessories">Accessories</a> <a href="/us/shop/goto/giftcards">Gift Cards</a> </ul> </div></div><div class="as-globalfooter-directory-column"> <div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">Services</h3> <ul class="as-globalfooter-directory-column-section-list"> <a href="/apple-music/">Apple Music</a> <a href="/apple-tv-plus/">Apple TV+</a> <a href="/apple-fitness-plus/">Apple Fitness+</a> <a href="/apple-news/">Apple News+</a> <a href="/apple-arcade/">Apple Arcade</a> <a href="/icloud/">iCloud</a> <a href="/apple-one/">Apple One</a> <a href="/apple-card/">Apple Card</a> <a href="/apple-books/">Apple Books</a> <a href="/apple-podcasts/">Apple Podcasts</a> <a href="/app-store/">App Store</a> </ul> </div><div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">Account</h3> <ul class="as-globalfooter-directory-column-section-list"> <a href="https://appleid.apple.com/us/">Manage Your Apple ID</a> <a href="/us/shop/goto/account">Apple Store Account</a> <a href="https://www.icloud.com">iCloud.com</a> </ul> </div></div><div class="as-globalfooter-directory-column"> <div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">Apple Store</h3> <ul class="as-globalfooter-directory-column-section-list"> <a href="/retail/">Find a Store</a> <a href="/retail/geniusbar/">Genius Bar</a> <a href="/today/">Today at Apple</a> <a href="/today/camp/">Apple Camp</a> <a href="https://itunes.apple.com/app/apple-store/id375380948">Apple Store App</a> <a href="/us/shop/goto/special_deals">Refurbished and Clearance</a> <a href="/us/shop/goto/payment_plan">Financing</a> <a href="/us/shop/goto/trade_in">Apple Trade In</a> <a href="/us/shop/goto/order/list">Order Status</a> <a href="/us/shop/goto/help">Shopping Help</a> </ul> </div></div><div class="as-globalfooter-directory-column"> <div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">For Business</h3> <ul class="as-globalfooter-directory-column-section-list"> <a href="/business/">Apple and Business</a> <a href="/retail/business/">Shop for Business</a> </ul> </div><div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">For Education</h3> <ul class="as-globalfooter-directory-column-section-list"> <a href="/education/">Apple and Education</a> <a href="/education/k12/how-to-buy/">Shop for K-12</a> <a href="/us/shop/goto/educationrouting">Shop for College</a> </ul> </div><div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">For Healthcare</h3> <ul class="as-globalfooter-directory-column-section-list"> <a href="/healthcare/">Apple in Healthcare</a> <a href="/healthcare/apple-watch/">Health on Apple Watch</a> <a href="/healthcare/health-records/">Health Records on iPhone</a> </ul> </div><div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">For Government</h3> <ul class="as-globalfooter-directory-column-section-list"> <a href="/r/store/government/">Shop for Government</a> <a href="/us/shop/goto/eppstore/veteransandmilitary">Shop for Veterans and Military</a> </ul> </div></div><div class="as-globalfooter-directory-column"> <div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">Apple Values</h3> <ul class="as-globalfooter-directory-column-section-list"> <a href="/accessibility/">Accessibility</a> <a href="/education/connectED/">Education</a> <a href="/environment/">Environment</a> <a href="/diversity/">Inclusion and Diversity</a> <a href="/privacy/">Privacy</a> <a href="/racial-equity-justice-initiative/">Racial Equity
*/
and Justice</a> <a href="/supplier-responsibility/">Supplier Responsibility</a> </ul> </div><div class="as-globalfooter-directory-column-section"> <h3 class="as-globalfooter-directory-column-section-title">About Apple</h3> <ul class="as-globalfooter-directory-column-section-list"> <a href="/newsroom/">Newsroom</a> <a href="/leadership/">Apple Leadership</a> <a href="/careers/us/">Career Opportunities</a> <a href="https://investor.apple.com">Investors</a> <a href="/compliance/">Ethics &amp; Compliance</a> <a href="/apple-events/">Events</a> <a href="/contact/">Contact Apple</a> </ul> </div></div></nav> <div class="as-globalfooter-mini"> <div class="as-globalfooter-mini-shop">More ways to shop: 
<a href="/retail/">Find an Apple Store</a> or <a href="https://locate.apple.com/">other retailer</a> near you. <span>Or call 1-800-MY-APPLE.</span> </div><div class="as-globalfooter-mini-locale"> <a class="as-globalfooter-mini-locale-link" href="/choose-country-region/" title="Choose your country or region" aria-label="United States. Choose your country or region" data-analytics-title="choose your country">United States</a> </div><p class="as-globalfooter-mini-legal-copyright">Copyright © 2022 Apple Inc. All rights reserved. </p><a class="as-globalfooter-mini-legal-link" href="/legal/privacy/">Privacy Policy </a> <a class="as-globalfooter-mini-legal-link" href="/legal/internet-services/terms/site.html">Terms of Use </a> <a class="as-globalfooter-mini-legal-link" href="/us/shop/goto/help/sales_refunds">Sales 
and Refunds </a> <a class="as-globalfooter-mini-legal-link" href="/legal/">Legal </a> <a class="as-globalfooter-mini-legal-link" href="/sitemap/">Site Map </a> </div></div></footer> <script src="https://www.apple.com/v/errors/c/built/scripts/main.built.js" type="text/javascript" charset="utf-8"></script></body></html>
EN

回答 1

Stack Overflow用户

发布于 2022-07-15 21:45:42

首先,剧作家已经有了一整套在活动页面上工作的选择器,因此为了消除依赖,加快抓取速度,使用更少的代码,并避免在静态HTML快照与活动页面不同步时出现奇怪的错误,我建议使用跳过BS

关于主要的问题,通过打印HTML来查看您正在处理的是什么类型的响应,您做得很好。404页表示在无头运行时检测到您是机器人,但这通常表现为captcha、Cloudflare浏览器检查页或其他“您是机器人吗?”注意。

与刮取中的所有内容一样,没有一刀切的解决方案,但一种典型的方法是设置自定义用户代理字符串:

代码语言:javascript
复制
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    ua = (
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/69.0.3497.100 Safari/537.36"
    )
    url = (
        "https://www.apple.com/br/shop/product/MV7N2BE/A/airpods-com-estojo-de-recarga"
    )
    page = browser.new_page(user_agent=ua)
    page.goto(url, wait_until="domcontentloaded")
    sel = "span.as-price-installments:last-child"
    text = (
        page.wait_for_selector(sel)
        .text_content()
        .replace("à vista (10% de desconto)", "")
        .strip()
    )
    print(text)  # => R$ 1.399,50
    browser.close()
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72887356

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档