文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在单词嵌入模型BERT上使用自己的语料库

问如何在单词嵌入模型BERT上使用自己的语料库
EN

Stack Overflow用户

提问于 2020-12-15 18:29:15

回答 1查看 64关注 0票数 1

我正在尝试使用google的单词嵌入模型BERT创建一个问答模型。我是个新手，我真的很想使用我自己的语料库进行培训。一开始，我使用了huggingface site中的一个例子，它运行良好：

from transformers import pipeline

qa_pipeline = pipeline(
    "question-answering",
    model="henryk/bert-base-multilingual-cased-finetuned-dutch-squad2",
    tokenizer="henryk/bert-base-multilingual-cased-finetuned-dutch-squad2"
)

qa_pipeline({
    'context': "Amsterdam is de hoofdstad en de dichtstbevolkte stad van Nederland.",
    'question': "Wat is de hoofdstad van Nederland?"})

输出

> {'answer': 'Amsterdam', 'end': 9, 'score': 0.825619101524353, 'start': 0}

因此，我尝试创建一个.txt文件来测试是否可以在.txt文件中将context参数中的句子与完全相同的句子互换。

with open('test.txt') as f:
    lines = f.readlines()

qa_pipeline = pipeline(
    "question-answering",
    model="henryk/bert-base-multilingual-cased-finetuned-dutch-squad2",
    tokenizer="henryk/bert-base-multilingual-cased-finetuned-dutch-squad2"
)

qa_pipeline({
    'context': lines,
    'question': "Wat is de hoofdstad van Nederland?"})

但这给了我以下错误：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-2bae0ecad43e> in <module>()
     10 qa_pipeline({
     11     'context': lines,
---> 12     'question': "Wat is de hoofdstad van Nederland?"})

5 frames
/usr/local/lib/python3.6/dist-packages/transformers/data/processors/squad.py in _is_whitespace(c)
     84 
     85 def _is_whitespace(c):
---> 86     if c == " " or c == "\t" or c == "\r" or c == "\n" or ord(c) == 0x202F:
     87         return True
     88     return False

TypeError: ord() expected a character, but string of length 66 found

我只是在试验读取和使用.txt文件的方法，但我似乎没有找到其他解决方案。我对huggingface pipeline()函数做了一些研究，下面是关于问题和上下文参数的内容：

word-embedding

bert-language-model

huggingface-transformers

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-12-15 18:50:47

明白了!解决方案真的很简单。我假设变量'lines‘已经是一个字符串，但事实并非如此。通过将问题转换为字符串，问答模型就接受了我的test.txt文件。

所以出自：

with open('test.txt') as f:
    lines = f.readlines()

至：

with open('test.txt') as f:
    lines = str(f.readlines())

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65304058

复制

相似问题

问如何在单词嵌入模型BERT上使用自己的语料库
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在单词嵌入模型BERT上使用自己的语料库EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在单词嵌入模型BERT上使用自己的语料库
EN