有没有办法在python中将PDF(或文本文件)转换为Word文档?我正在为我的教授做一些网络搜索,原始文档是PDF。我将其中的1,611个文件转换为文本文件,现在我们需要将它们转换为Word文档。我唯一能找到的是一个Word-to-txt转换器,而不是相反的。
谢谢!
发布于 2015-03-28 08:23:12
使用python-docx,我可以很容易地将txt文件转换为Word文档。
这是我所做的。
from docx import Document
import re
import os
path = '/users/tdobbins/downloads/smithtxt'
direct = os.listdir(path)
for i in direct:
document = Document()
document.add_heading(i, 0)
myfile = open('/path/to/read/from/'+i).read()
myfile = re.sub(r'[^\x00-\x7F]+|\x0c',' ', myfile) # remove all non-XML-compatible characters
p = document.add_paragraph(myfile)
document.save('/path/to/write/to/'+i+'.docx')发布于 2015-03-28 05:58:52
您可以查看python-docx。它可以用python创建word文档,这样你就可以将文本文件存储到Word中。请参阅python-docx - what-it-can-do
发布于 2019-11-07 23:45:58
你可以使用GroupDocs.Conversion Cloud,它提供了文本/ Python SDK到DOC/DOCX的转换和许多其他常见的文件格式从一种格式到另一种格式,而不依赖于任何第三方工具或软件。
以下是Python代码示例。
# Import module
import groupdocs_conversion_cloud
# Get your app_sid and app_key at https://dashboard.groupdocs.cloud (free registration is required).
app_sid = "xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxx"
app_key = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# Create instance of the API
convert_api = groupdocs_conversion_cloud.ConvertApi.from_keys(app_sid, app_key)
file_api = groupdocs_conversion_cloud.FileApi.from_keys(app_sid, app_key)
try:
#upload soruce file to storage
filename = 'Sample.pdf'
remote_name = 'Sample.pdf'
output_name= 'sample.doc'
strformat='doc'
request_upload = groupdocs_conversion_cloud.UploadFileRequest(remote_name,filename)
response_upload = file_api.upload_file(request_upload)
#Convert PDF to Word document
settings = groupdocs_conversion_cloud.ConvertSettings()
settings.file_path =remote_name
settings.format = strformat
settings.output_path = output_name
loadOptions = groupdocs_conversion_cloud.PdfLoadOptions()
loadOptions.hide_pdf_annotations = True
loadOptions.remove_embedded_files = False
loadOptions.flatten_all_fields = True
settings.load_options = loadOptions
convertOptions = groupdocs_conversion_cloud.DocxConvertOptions()
convertOptions.from_page = 1
convertOptions.pages_count = 1
settings.convert_options = convertOptions
.
request = groupdocs_conversion_cloud.ConvertDocumentRequest(settings)
response = convert_api.convert_document(request)
print("Document converted successfully: " + str(response))
except groupdocs_conversion_cloud.ApiException as e:
print("Exception when calling get_supported_conversion_types: {0}".format(e.message))https://stackoverflow.com/questions/29310786
复制相似问题