我试着训练一个聊天机器人,大部分数据都在文本文件中。
我拉:
Matt said you have a "shit load" of dining dollars. I have almost none so if you're willing to sell, I'm willing to buy.在文本文件中,但是当聊天机器人语料库试图训练机器人时,它将上面的内容读为:
'Matt said you have a "shit load" of dining dollars\\ I have almost none so if you\'re willing to sell, I\'m willing to buy\\\n'我怎么才能解决这个问题?
这是我的密码:
def train_from_text():
#chatbot.set_trainer(ListTrainer)
directory = basedir + "Text Trainers"
files = find_files_in_directory(directory)
for file in files:
conversation = []
file_name = directory+"/"+file
with open(file_name, 'r') as to_read:
for line in to_read:
conversation.append(line)
chatbot.train(conversation)请原谅我说脏话,这是我得到的数据。
编辑:全错误
Traceback (most recent call last):
File "E:/Jason Chatterbot/Jason Chat.py", line 102, in <module>
control()
File "E:/Jason Chatterbot/Jason Chat.py", line 96, in control
train_from_text()
File "E:/Jason Chatterbot/Jason Chat.py", line 58, in train_from_text
chatbot.train(conversation)
File "C:\Python27\lib\site-packages\chatterbot\trainers.py", line 119, in train
corpora = self.corpus.load_corpus(corpus_path)
File "C:\Python27\lib\site-packages\chatterbot_corpus\corpus.py", line 98, in load_corpus
corpus_data = self.read_corpus(file_path)
File "C:\Python27\lib\site-packages\chatterbot_corpus\corpus.py", line 63, in read_corpus
with io.open(file_name, encoding='utf-8') as data_file:
IOError: [Errno 22] Invalid argument: 'Matt said you have a "shit load" of dining dollars\\ I have almost none so if you\'re willing to sell, I\'m willing to buy\\\r\n'发布于 2017-11-28 04:44:44
如果不查看更大的数据子集,似乎它正在用转义单引号(\')替换单引号('),用转义换行符(\n)替换实际换行符,用双反斜杠()替换句点()。
一个简单的字符串替换可能会为您修复它,这取决于数据被处理得有多糟糕。试着改变
conversation.append(line)至
conversation.append(line.replace("\\'","'").replace('\\\\','.').replace("\\n","\n"))我们基本上是在试图逆转那些被自动替换的产品。
https://stackoverflow.com/questions/47524090
复制相似问题