文章/答案/技术大牛

发布

社区首页 >问答首页 >使用Python从word文档中提取图像

问使用Python从word文档中提取图像
EN

Stack Overflow用户

提问于 2019-06-03 21:31:06

回答 3查看 3.7K关注 0票数 2

如何使用python从word文档中提取图像/徽标并将其存储在文件夹中。下面的代码将docx转换为html，但并不从html中提取图像。任何意见/建议都会有很大的帮助。

    profile_path = <file path>
    result=mammoth.convert_to_html( profile_path)
    f = open(profile_path, 'rb')
    b = open(profile_html, 'wb')
    document = mammoth.convert_to_html(f)
    b.write(document.value.encode('utf8'))
    f.close()
    b.close()

python-2.7

python

python-3.x

回答 3

Stack Overflow用户

发布于 2019-12-17 23:06:35

您可以使用库，它将读取您的.docx文档并将图像导出到您指定的目录(必须存在)。

!pip install docx2txt
import docx2txt
text = docx2txt.process("/path/your_word_doc.docx", '/home/example/img/')

执行之后，您将在/home/example/img/中获得图像，而变量 text 将获得文档文本。它们将被命名为image1.png ...按出现顺序排列为imageN.png。

注意: Word文档必须为.docx格式。

票数 2

Stack Overflow用户

发布于 2021-11-22 18:04:44

原生的，没有任何库

从docx (它是zip文件的变体)中提取源图像，而不失真或转换。

在操作系统上执行shell并运行

tar -m -xf DocxWithImages.docx word/media

您将在word媒体文件夹中找到源图像Jpeg、PNG、WMF或其他，并将其解压到该名称的文件夹中。这些是没有缩放或裁剪的未掺杂源嵌入。

您可能会感到惊讶，可见区域可能比docx本身中使用的任何裁剪版本都大，因此需要注意Word并不总是按预期裁剪图像(这是令人尴尬的编校失败的原因之一)

票数 1

Stack Overflow用户

发布于 2021-11-22 12:17:40

使用python提取docx文件中的所有图像

1.使用docxtxt

import docx2txt
#extract text 
text = docx2txt.process(r"filepath_of_docx")
#extract text and write images in Temporary Image directory
text = docx2txt.process(r"filepath_of_docx",r"Temporary_Image_Directory")

2.使用aspose

import aspose.words as aw
# load the Word document
doc = aw.Document(r"filepath")
# retrieve all shapes
shapes = doc.get_child_nodes(aw.NodeType.SHAPE, True)
imageIndex = 0
# loop through shapes
for shape in shapes :
    shape = shape.as_shape()
    if (shape.has_image) :
        # set image file's name
        imageFileName = f"Image.ExportImages.{imageIndex}_{aw.FileFormatUtil.image_type_to_extension(shape.image_data.image_type)}"
        # save image
        shape.image_data.save(imageFileName)
        imageIndex += 1

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/56428445

复制

相似问题

问使用Python从word文档中提取图像
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Python从word文档中提取图像EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用Python从word文档中提取图像
EN