首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >NLTK:语料库级别的bleu与句子级别的BLEU得分

NLTK:语料库级别的bleu与句子级别的BLEU得分
EN

Stack Overflow用户
提问于 2016-11-11 14:44:48
回答 2查看 20.4K关注 0票数 17

我在python中导入了nltk,在Ubuntu上计算了BLEU的分数。我知道句子级别的BLEU评分是如何工作的,但我不明白语料库级别的BLEU评分是如何工作的。

以下是我对语料库级别的BLEU评分的代码:

代码语言:javascript
复制
import nltk

hypothesis = ['This', 'is', 'cat'] 
reference = ['This', 'is', 'a', 'cat']
BLEUscore = nltk.translate.bleu_score.corpus_bleu([reference], [hypothesis], weights = [1])
print(BLEUscore)

由于某种原因,上述代码的bleu得分为0。我预计语料库级别的BLEU分数至少为0.5分。

这是我的句子级别的BLEU得分的代码

代码语言:javascript
复制
import nltk

hypothesis = ['This', 'is', 'cat'] 
reference = ['This', 'is', 'a', 'cat']
BLEUscore = nltk.translate.bleu_score.sentence_bleu([reference], hypothesis, weights = [1])
print(BLEUscore)

在这里,句子级别的BLEU得分是0.71,这是我预期的,考虑到简短的惩罚和遗漏的单词"a“。然而,我不明白语料库级别的BLEU评分是如何工作的。

任何帮助都将不胜感激。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2016-11-17 17:03:56

TL;DR

代码语言:javascript
复制
>>> import nltk
>>> hypothesis = ['This', 'is', 'cat'] 
>>> reference = ['This', 'is', 'a', 'cat']
>>> references = [reference] # list of references for 1 sentence.
>>> list_of_references = [references] # list of references for all sentences in corpus.
>>> list_of_hypotheses = [hypothesis] # list of hypotheses that corresponds to list of references.
>>> nltk.translate.bleu_score.corpus_bleu(list_of_references, list_of_hypotheses)
0.6025286104785453
>>> nltk.translate.bleu_score.sentence_bleu(references, hypothesis)
0.6025286104785453

(注:需要在develop分支拉取最新版本的NLTK,才能得到稳定版本的BLEU评分实现)

Long中的

实际上,如果您的整个语料库中只有一个引用和一个假设,那么corpus_bleu()sentence_bleu()都应该返回相同的值,如上面的示例所示。

在代码中,我们可以看到sentence_bleu is actually a duck-type of corpus_bleu

代码语言:javascript
复制
def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
                  smoothing_function=None):
    return corpus_bleu([references], [hypothesis], weights, smoothing_function)

如果我们看一下sentence_bleu的参数

代码语言:javascript
复制
 def sentence_bleu(references, hypothesis, weights=(0.25, 0.25, 0.25, 0.25),
                      smoothing_function=None):
    """"
    :param references: reference sentences
    :type references: list(list(str))
    :param hypothesis: a hypothesis sentence
    :type hypothesis: list(str)
    :param weights: weights for unigrams, bigrams, trigrams and so on
    :type weights: list(float)
    :return: The sentence-level BLEU score.
    :rtype: float
    """

sentence_bleu引用的输入是一个list(list(str))

代码语言:javascript
复制
references = [ ["This", "is", "a", "cat"], ["This", "is", "a", "feline"] ]
hypothesis = ["This", "is", "cat"]
sentence_bleu(references, hypothesis)

代码语言:javascript
复制
def corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25),
                smoothing_function=None):
    """
    :param references: a corpus of lists of reference sentences, w.r.t. hypotheses
    :type references: list(list(list(str)))
    :param hypotheses: a list of hypothesis sentences
    :type hypotheses: list(list(str))
    :param weights: weights for unigrams, bigrams, trigrams and so on
    :type weights: list(float)
    :return: The corpus-level BLEU score.
    :rtype: float
    """

除了查看nltk/translate/bleu_score.py中的doctest之外,您还可以查看nltk/test/unit/translate/test_bleu_score.py中的单元测试,以了解如何使用bleu_score.py中的每个组件。

顺便说一句,由于在(nltk.translate.__init__.py](https://github.com/nltk/nltk/blob/develop/nltk/translate/.py#L21)中将sentence_bleu作为bleu导入,因此使用

代码语言:javascript
复制
from nltk.translate import bleu 

将与以下内容相同:

代码语言:javascript
复制
from nltk.translate.bleu_score import sentence_bleu

在代码中:

代码语言:javascript
复制
>>> from nltk.translate import bleu
>>> from nltk.translate.bleu_score import sentence_bleu
>>> from nltk.translate.bleu_score import corpus_bleu
>>> bleu == sentence_bleu
True
>>> bleu == corpus_bleu
False
票数 31
EN

Stack Overflow用户

发布于 2016-11-12 04:54:03

让我们来看一下:

代码语言:javascript
复制
>>> help(nltk.translate.bleu_score.corpus_bleu)
Help on function corpus_bleu in module nltk.translate.bleu_score:

corpus_bleu(list_of_references, hypotheses, weights=(0.25, 0.25, 0.25, 0.25), smoothing_function=None)
    Calculate a single corpus-level BLEU score (aka. system-level BLEU) for all 
    the hypotheses and their respective references.  

    Instead of averaging the sentence level BLEU scores (i.e. marco-average 
    precision), the original BLEU metric (Papineni et al. 2002) accounts for 
    the micro-average precision (i.e. summing the numerators and denominators
    for each hypothesis-reference(s) pairs before the division).
    ...

您比我更容易理解算法的描述,所以我不会尝试向您“解释”它。如果文档字符串不够清晰,可以看看the source本身。或者在本地找到它:

代码语言:javascript
复制
>>> nltk.translate.bleu_score.__file__
'.../lib/python3.4/site-packages/nltk/translate/bleu_score.py'
票数 6
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/40542523

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档