我正在尝试使用Pypdf2从pdf中提取文本,并使用Textblob进行翻译。
import PyPDF2 as pdf
from docx import Document
from textblob import TextBlob
Arquivo = 'teste.pdf'
lgout = input('\nPara qual língua traduzir? ex: pt, en, es: ')
lgin = input('\nQual língua é o documento? ex: pt, en, es: ')
with open(Arquivo, mode='rb') as f:
reader = pdf.PdfFileReader(f)
npages = int(reader.numPages) -1
ret = 0
while ret <= npages:
page = reader.getPage(ret)
pagext = str(page.extractText())
blob = TextBlob(pagext)
text_trans = (blob.translate(from_lang=lgin,to = lgout))
doc = Document()
doc.add_paragraph(str(text_trans))
doc.save('Doc teste' + str(ret) + '.docx')
ret +=1
else:
print("Documento convertido")但是当我运行脚本时,我得到了错误
Traceback (most recent call last):
File "/Users/Pedrovhz/Desktop/Estudos/Python/Python Translator/tradutor_pdf.py", line 18, in <module>
text_trans = (blob.translate(from_lang=lginout,to = lgoutpu))
File "/anaconda3/lib/python3.7/site-packages/textblob/blob.py", line 547, in translate
from_lang=from_lang, to_lang=to))
File "/anaconda3/lib/python3.7/site-packages/textblob/translate.py", line 61, in translate
self._validate_translation(source, result)
File "/anaconda3/lib/python3.7/site-packages/textblob/translate.py", line 85, in _validate_translation
raise NotTranslated('Translation API returned the input string unchanged.')
textblob.exceptions.NotTranslated: Translation API returned the input string unchanged.我不知道我做错了什么,谢谢你的帮助!
发布于 2020-06-25 16:07:16
我认为你应该看看这个链接textblob.exceptions.NotTranslated: Translation API returned the input string unchanged
作为一种解决办法,您可以使用try-catch块或if-else,通过检测语言并翻译它。我已经编写了一个示例代码片段
if blob.detect_language() == 'es':
translated = blob.translate(from_lang='es',to='en')
print(translated)
elif blob.detect_language() == 'de':
translated = blob.translate(from_lang='de',to='en')
print(translated)
else:
print("not translated")如果对你不起作用,请告诉我。
https://stackoverflow.com/questions/61872385
复制相似问题