import asyncio
import pyppeteer
import logging
from pyppeteer import launch
pyppeteer.DEBUG = True
for name in logging.root.manager.loggerDict:
logging.getLogger(name).disabled = True
async def main():
browser = await launch(headless = False)
page = await browser.newPage()
await page.setJavaScriptEnabled(True)
response = await page.goto('http://www.africau.edu/images/default/sample.pdf',
time = 3000, waitUntil = ['domcontentloaded', 'load', 'networkidle0'])
content = await response.buffer()
print(content)
await browser.close()
asyncio.get_event_loop().run_until_complete(main())预期产出:http://www.africau.edu/images/default/sample.pdf的内容
输出: b'df48fcc4-a0b0-4e86-b52e-0ec012ee791e‘
Python 3,Linux Ubuntu
发布于 2022-01-18 10:56:24
我建议使用pyppdf,它是Puppeteer的Python。
conda install -c defaults -c conda-forge pyppdf
OR
pip install pyppdf它有一个方便的函数save_pdf。
def save_pdf(output_file: str=None,url: str=None,html: str=None,args_dict: Unionstr,dict=None,args_upd: Unionstr,dict=None,goto: str=None,dir_:str=None) ->字节:
或者你可以简单的
await page.screenshot({'path': 'ss.png'})
await page.pdf({'path': 'sample.pdf'})发布于 2022-01-23 12:08:48
我知道您要求使用pyppeteer解决方案,但老实说,使用requests可以更容易地做到这一点。
import requests
def main():
r = requests.get("http://www.africau.edu/images/default/sample.pdf")
with open("sample.pdf", "wb") as file:
file.write(r.content)
if __name__ == "__main__":
main()这就是您的所有文件将保存在一个名为sample.pdf的文件中。
https://stackoverflow.com/questions/70714677
复制相似问题