首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何使用Acrobat SDK将PDF转换为Word?

如何使用Acrobat SDK将PDF转换为Word?
EN

Stack Overflow用户
提问于 2012-07-05 17:03:24
回答 3查看 9.7K关注 0票数 7

我的.Net应用程序需要以编程方式将文档转换为Word格式。

我评估了几个产品,找到了Acrobat X Pro,它提供了一个另存为选项,我们可以在其中将文档保存为Word/Excel格式。我试着使用Acrobat SDK,但是找不到合适的文档从哪里开始。

我查看了他们的IAC示例,但无法理解如何调用菜单项并使其执行另存为选项。

EN

回答 3

Stack Overflow用户

发布于 2012-12-13 04:18:30

您可以使用Acrobat X Pro完成此操作,但您需要使用c#中的javascript API。

代码语言:javascript
复制
 AcroPDDoc pdfd = new AcroPDDoc();
 pdfd.Open(sourceDoc.FileFullPath);
 Object jsObj = pdfd.GetJSObject();
 Type jsType = pdfd.GetType();
 //have to use acrobat javascript api because, acrobat
 object[] saveAsParam = { "newFile.doc", "com.adobe.acrobat.doc", "", false, false };
 jsType.InvokeMember("saveAs",BindingFlags.InvokeMethod | BindingFlags.Public | BindingFlags.Instance,null, jsObj, saveAsParam, CultureInfo.InvariantCulture);

希望这能有所帮助。

票数 15
EN

Stack Overflow用户

发布于 2014-10-28 12:04:41

我使用WinPython x64 2.7.6.3和Acrobat X Pro做了非常类似的事情,并使用JSObject界面将PDF转换为DOCX。本质上是与jle's相同的解决方案。

下面应该是将一组PDF转换为DOCX的完整代码:

代码语言:javascript
复制
# gets all files under ROOT_INPUT_PATH with FILE_EXTENSION and tries to extract text from them into ROOT_OUTPUT_PATH with same filename as the input file but with INPUT_FILE_EXTENSION replaced by OUTPUT_FILE_EXTENSION
from win32com.client import Dispatch
from win32com.client.dynamic import ERRORS_BAD_CONTEXT

import winerror

# try importing scandir and if found, use it as it's a few magnitudes of an order faster than stock os.walk
try:
    from scandir import walk
except ImportError:
    from os import walk

import fnmatch

import sys
import os

ROOT_INPUT_PATH = None
ROOT_OUTPUT_PATH = None
INPUT_FILE_EXTENSION = "*.pdf"
OUTPUT_FILE_EXTENSION = ".docx"

def acrobat_extract_text(f_path, f_path_out, f_basename, f_ext):
    avDoc = Dispatch("AcroExch.AVDoc") # Connect to Adobe Acrobat

    # Open the input file (as a pdf)
    ret = avDoc.Open(f_path, f_path)
    assert(ret) # FIXME: Documentation says "-1 if the file was opened successfully, 0 otherwise", but this is a bool in practise?

    pdDoc = avDoc.GetPDDoc()

    dst = os.path.join(f_path_out, ''.join((f_basename, f_ext)))

    # Adobe documentation says "For that reason, you must rely on the documentation to know what functionality is available through the JSObject interface. For details, see the JavaScript for Acrobat API Reference"
    jsObject = pdDoc.GetJSObject()

    # Here you can save as many other types by using, for instance: "com.adobe.acrobat.xml"
    jsObject.SaveAs(dst, "com.adobe.acrobat.docx") # NOTE: If you want to save the file as a .doc, use "com.adobe.acrobat.doc"

    pdDoc.Close()
    avDoc.Close(True) # We want this to close Acrobat, as otherwise Acrobat is going to refuse processing any further files after a certain threshold of open files are reached (for example 50 PDFs)
    del pdDoc

if __name__ == "__main__":
    assert(5 == len(sys.argv)), sys.argv # <script name>, <script_file_input_path>, <script_file_input_extension>, <script_file_output_path>, <script_file_output_extension>

    #$ python get.docx.from.multiple.pdf.py 'C:\input' '*.pdf' 'C:\output' '.docx' # NOTE: If you want to save the file as a .doc, use '.doc' instead of '.docx' here and ensure you use "com.adobe.acrobat.doc" in the jsObject.SaveAs call

    ROOT_INPUT_PATH = sys.argv[1]
    INPUT_FILE_EXTENSION = sys.argv[2]
    ROOT_OUTPUT_PATH = sys.argv[3]
    OUTPUT_FILE_EXTENSION = sys.argv[4]

    # tuples are of schema (path_to_file, filename)
    matching_files = ((os.path.join(_root, filename), os.path.splitext(filename)[0]) for _root, _dirs, _files in walk(ROOT_INPUT_PATH) for filename in fnmatch.filter(_files, INPUT_FILE_EXTENSION))

    # patch ERRORS_BAD_CONTEXT as per https://mail.python.org/pipermail/python-win32/2002-March/000265.html
    global ERRORS_BAD_CONTEXT
    ERRORS_BAD_CONTEXT.append(winerror.E_NOTIMPL)

    for filename_with_path, filename_without_extension in matching_files:
        print "Processing '{}'".format(filename_without_extension)
        acrobat_extract_text(filename_with_path, ROOT_OUTPUT_PATH, filename_without_extension, OUTPUT_FILE_EXTENSION)
票数 3
EN

Stack Overflow用户

发布于 2012-11-07 17:50:32

Adobe不支持PDF到Word的转换,除非您使用的是他们的Acrobat PDF客户端。不过,你不能用他们的SDK或者调用命令行来做到这一点。您只能手动执行此操作。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/11341073

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档