从Python脚本中,我需要调用PL->EN翻译服务。翻译需要三个步骤:标记化、翻译和去编织。
在Linux中,我可以通过按上述顺序执行以下命令来使用3个进程来实现这一点:
/home/nlp/opt/moses/scripts/tokenizer/tokenizer.perl -l pl < path_to_input.txt > path_to_output.tok.txt
/home/nlp/opt/moses/bin/moses -f /home/nlp/Downloads/TED/tuning/moses.tuned.ini.1 -drop-unknown -input-file path_to_output.tok.txt -th 8 > path_to_output.trans.txt
/home/nlp/opt/moses/scripts/tokenizer/detokenizer.perl -l en < path_to_output.trans.txt > path_to_output.final.txt,将文件path_to_input.txt和输出转换为path_to_output.final.txt。
我已经为组合这三个过程编写了以下脚本:
import shlex
import subprocess
from subprocess import STDOUT,PIPE
import os
import socket
class Translator:
@staticmethod
def pl_to_en(input_file, output_file):
# Tokenize
print("Tokenization started")
with open("tokenized.txt", "w+") as tokenizer_output:
with open(input_file) as tokenizer_input:
cmd = "/home/nlp/opt/moses/scripts/tokenizer/tokenizer.perl - l pl"
args = shlex.split(cmd)
p = subprocess.Popen(args, stdin=tokenizer_input, stdout=tokenizer_output)
p.wait()
print("Tokenization finished")
#Translate
print("Translation started")
with open("translated.txt", "w+") as translator_output:
cmd = "/home/nlp/opt/moses/bin/moses -f /home/nlp/Downloads/TED/tuning/moses.tuned.ini.1 -drop-unknown -input-file tokenized.txt -th 8"
args = shlex.split(cmd)
p = subprocess.Popen(args, stdout=translator_output)
p.wait()
print("Translation finished")
# Detokenize
print("Detokenization started")
with open("translated.txt") as detokenizer_input:
with open("detokenized.txt", "w+") as detokenizer_output:
cmd = "/home/nlp/opt/moses/scripts/tokenizer/detokenizer.perl -l en"
args = shlex.split(cmd)
p = subprocess.Popen(args, stdin=detokenizer_input, stdout=detokenizer_output)
p.wait()
print("Detokenization finished")
translator = Translator()
translator.pl_to_en("some_input_file.txt", "some_output_file.txt")但只有标记部分起作用。翻译程序只输出一个空文件translated.txt。当查看终端中的输出时,它看起来就像翻译器正确地加载文件tokenized.txt,并执行转换。问题在于我是如何从这个过程中收集输出的。
发布于 2015-08-01 12:12:39
我将尝试如下所示--将转换器进程的输出发送到管道,并将脱标记器的输入改为管道,而不是使用文件。
import shlex
import subprocess
from subprocess import STDOUT,PIPE
import os
import socket
class Translator:
@staticmethod
def pl_to_en(input_file, output_file):
# Tokenize
print("Tokenization started")
with open("tokenized.txt", "w+") as tokenizer_output:
with open(input_file) as tokenizer_input:
cmd = "/home/nlp/opt/moses/scripts/tokenizer/tokenizer.perl - l pl"
args = shlex.split(cmd)
p = subprocess.Popen(args, stdin=tokenizer_input, stdout=tokenizer_output)
p.wait()
print("Tokenization finished")
#Translate
print("Translation started")
cmd = "/home/nlp/opt/moses/bin/moses -f /home/nlp/Downloads/TED/tuning/moses.tuned.ini.1 -drop-unknown -input-file tokenized.txt -th 8"
args = shlex.split(cmd)
translate_p = subprocess.Popen(args, stdout=subprocess.PIPE)
translate_p.wait()
print("Translation finished")
# Detokenize
print("Detokenization started")
with open("detokenized.txt", "w+") as detokenizer_output:
cmd = "/home/nlp/opt/moses/scripts/tokenizer/detokenizer.perl -l en"
args = shlex.split(cmd)
detokenizer_p = subprocess.Popen(args, stdin=translate_p.stdout, stdout=detokenizer_output)
detokenizer_p.wait()
print("Detokenization finished")
translator = Translator()
translator.pl_to_en("some_input_file.txt", "some_output_file.txt")https://stackoverflow.com/questions/31761414
复制相似问题