首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >无法使用pdfplumber.open打开PDF文件

无法使用pdfplumber.open打开PDF文件
EN

Stack Overflow用户
提问于 2020-11-19 20:40:21
回答 2查看 1.2K关注 0票数 1

我一直在关注一个名为"Pythonic accountant“的YouTube频道,我一直在尝试复制教程4,该教程教授如何从PDF发票中提取数据,但我失败了。我一直收到一个错误,我还不知道如何解决。我在OSx上使用anaconda和Jupyter笔记本。我的代码如下所示:

代码语言:javascript
复制
    import requests
    import pdfplumber

    def download_file(url):
        local_filename = url.split('/')[-1]

        with requests.get(url) as r:
            with open(local_filename, 'wb') as f:
            f.write(r.content)
        
        return local_filename

    invoice_url = 'http://www.k-billing.com/example_invoices/professionalblue_example.pdf'

    invoice = download_file(invoice_url)

    with pdfplumber.open(invoice) as pdf:
        page = pdf.pages[0]
        text = page.extract_text()

在本教程中,代码运行良好。在我的例子中,我得到了以下错误:

代码语言:javascript
复制
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-6-de1887236e07> in <module>
    ----> 1 with pdfplumber.open(invoice) as pdf:
          2     page = pdf.pages[0]
          3     text = page.extract_text()

    AttributeError: module 'pdfplumber' has no attribute 'open'

我已经使用pip install安装了pdfplumber。我已经在网上搜索过这个错误了。我不知道我做错了什么。

EN

回答 2

Stack Overflow用户

发布于 2020-11-19 20:46:40

这里的缩进是错误的:

代码语言:javascript
复制
        with requests.get(url) as r:
            with open(local_filename, 'wb') as f:
            f.write(r.content)

它应该是:

代码语言:javascript
复制
        with requests.get(url) as r:
            with open(local_filename, 'wb') as f:
                f.write(r.content)

在用Python 3.8.2requests==2.22.0Ubuntu 20.04进行修正后,它适用于我。

我没有使用OSX和,代码对我来说是有效的。因此,这里是我安装pdfplumber的方式,你所使用的版本可能会给你指明正确的方向。

代码语言:javascript
复制
✓ alirvah ~ $ pip3 install pdfplumber
Collecting pdfplumber
  Downloading pdfplumber-0.5.24.tar.gz (42 kB)
     |████████████████████████████████| 42 kB 630 kB/s 
Requirement already satisfied: Pillow>=7.0.0 in /usr/lib/python3/dist-packages (from pdfplumber) (7.0.0)
Collecting Wand
  Downloading Wand-0.6.3-py2.py3-none-any.whl (133 kB)
     |████████████████████████████████| 133 kB 1.7 MB/s 
Collecting pdfminer.six==20200517
  Downloading pdfminer.six-20200517-py3-none-any.whl (5.6 MB)
     |████████████████████████████████| 5.6 MB 2.1 MB/s 
Collecting sortedcontainers
  Downloading sortedcontainers-2.3.0-py2.py3-none-any.whl (29 kB)
Collecting pycryptodome
  Downloading pycryptodome-3.9.9-cp38-cp38-manylinux1_x86_64.whl (13.7 MB)
     |████████████████████████████████| 13.7 MB 3.8 MB/s 
Requirement already satisfied: chardet; python_version > "3.0" in /usr/lib/python3/dist-packages (from pdfminer.six==20200517->pdfplumber) (3.0.4)
Building wheels for collected packages: pdfplumber
  Building wheel for pdfplumber (setup.py) ... done
  Created wheel for pdfplumber: filename=pdfplumber-0.5.24-py3-none-any.whl size=31123 sha256=e8edc98ee33fbe2caf6161ba8b9081a0dd798c8c747d8ceedff4f248cadf8e07
  Stored in directory: /home/alirvah/.cache/pip/wheels/2b/02/eb/8e0c88d08e0675b767895d4bcf54c0d4da1b37579b00409e0e
Successfully built pdfplumber
Installing collected packages: Wand, sortedcontainers, pycryptodome, pdfminer.six, pdfplumber
Successfully installed Wand-0.6.3 pdfminer.six-20200517 pdfplumber-0.5.24 pycryptodome-3.9.9 sortedcontainers-2.3.0

脚本的结果:

代码语言:javascript
复制
INVOICE
Invoice No. I1083
Account # C1006
Date 08-14-2008
Due By 08-31-2008
Demo Company
Phone : 111-222-3333 Terms None
1234 Main Street E-Mail : 333-444-4444 PO No. PO1234
Ashland, KY 41102 Web : http://www.ksoftware.net Sales Rep SalesPerson1
Bill To Ship To
Test Customer Test Customer
1234 Main Street 1234 Main Street
Ashland, KY 41101 Ashland,  41101
CCooddee DDeessccrriippttiioonn QTY Price Line Total
SKU1222 Test Import Name - Description Goes Here 1 $10.00 $10.00
Labor - Example labor item. Quantity is number of hours spent,  1.5 $100.00 $150.00
price is hourly rate. Quantity accepts decimal values.
Notes
An invoice note can go here. Multi-line and even multi-page notes are supported.
PPaayymmeenntt  DDeettaaiillss
Subtotal $160.00
Shipping$10.00 Tax $0.78
UPS Ground Total $170.78
Payments (-) $0.00
Balance Due $170.78
An invoice footer can go here
票数 0
EN

Stack Overflow用户

发布于 2020-11-22 11:14:09

事实证明,我尝试安装PDFplumber包时出现了问题。由于某种原因,我安装了一个较旧的版本(0.1.2)。一旦我解决了这个问题,并安装了正确的包(0.5.24),脚本就可以正常运行,我就能够完成本教程了。感谢您的贡献和帮助

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64911851

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档