我正在尝试在PyTorch中实现宏度量分数(F- F1 ),而不是使用已经广泛使用的sklearn.metrics.f1_score来直接在图形处理器上计算度量。
据我所知,为了计算宏F1分数,我需要计算所有标签的灵敏度和精度的F1分数,然后取所有这些的平均值。
我的尝试
我当前的实现如下所示:
def confusion_matrix(y_pred: torch.Tensor, y_true: torch.Tensor, n_classes: int):
conf_matrix = torch.zeros([n_classes, n_classes], dtype=torch.int)
y_pred = torch.argmax(y_pred, 1)
for t, p in zip(y_true.view(-1), y_pred.view(-1)):
conf_matrix[t.long(), p.long()] += 1
return conf_matrix
def forward(self, y_pred: torch.Tensor, y_true: torch.Tensor) -> torch.Tensor:
conf_matrix = confusion_matrix(y_pred, y_true, self.classes)
TP = conf_matrix.diag()
f1_scores = torch.zeros(self.classes, dtype=torch.float)
for c in range(self.classes):
idx = torch.ones(self.classes, dtype=torch.long)
idx[c] = 0
FP = conf_matrix[c, idx].sum()
FN = conf_matrix[idx, c].sum()
sensitivity = TP[c] / (TP[c] + FN + self.epsilon)
precision = TP[c] / (TP[c] + FP + self.epsilon)
f1_scores[c] += 2.0 * ((precision * sensitivity) / (precision + sensitivity + self.epsilon))
return f1_scores.mean()self.classes是标签的数量,self.epsilon是设置为10-e12的一个非常小的值,它可以防止DivisionByZeroError。
在训练时,我计算每一批的度量,并将所有度量的平均值作为最终分数。
问题
问题是,当我将我的自定义F1分数与sklearn的宏观F1分数进行比较时,它们很少是相等的。
# example 1
eval_cce 0.5203, eval_f1 0.8068, eval_acc 81.5455, eval_f1_sci 0.8023,
test_cce 0.4784, test_f1 0.7975, test_acc 82.6732, test_f1_sci 0.8097
# example 2
eval_cce 0.3304, eval_f1 0.8211, eval_acc 87.4955, eval_f1_sci 0.8626,
test_cce 0.3734, test_f1 0.8183, test_acc 85.4996, test_f1_sci 0.8424
# example 3
eval_cce 0.4792, eval_f1 0.7982, eval_acc 81.8482, eval_f1_sci 0.8001,
test_cce 0.4722, test_f1 0.7905, test_acc 82.6533, test_f1_sci 0.8139虽然我试图浏览互联网,但大多数情况下都是二进制分类。我仍然能够找到一个例子来尝试做我想做的事情。
我的问题
我的尝试有什么明显的问题吗?
更新(10.06.2020)
我还没有找出我的错误。由于时间限制,我决定只使用sklearn提供的F1宏评分。虽然它不能直接使用GPU张量,但对于我的情况来说,它已经足够快了。
然而,如果有人能弄清楚这一点,那就太棒了,这样任何其他可能遇到这个问题的人都可以解决他们的问题。
发布于 2020-08-11 20:43:50
不久前,我用Pytorch编写了自己的实现:
from typing import Tuple
import torch
class F1Score:
"""
Class for f1 calculation in Pytorch.
"""
def __init__(self, average: str = 'weighted'):
"""
Init.
Args:
average: averaging method
"""
self.average = average
if average not in [None, 'micro', 'macro', 'weighted']:
raise ValueError('Wrong value of average parameter')
@staticmethod
def calc_f1_micro(predictions: torch.Tensor, labels: torch.Tensor) -> torch.Tensor:
"""
Calculate f1 micro.
Args:
predictions: tensor with predictions
labels: tensor with original labels
Returns:
f1 score
"""
true_positive = torch.eq(labels, predictions).sum().float()
f1_score = torch.div(true_positive, len(labels))
return f1_score
@staticmethod
def calc_f1_count_for_label(predictions: torch.Tensor,
labels: torch.Tensor, label_id: int) -> Tuple[torch.Tensor, torch.Tensor]:
"""
Calculate f1 and true count for the label
Args:
predictions: tensor with predictions
labels: tensor with original labels
label_id: id of current label
Returns:
f1 score and true count for label
"""
# label count
true_count = torch.eq(labels, label_id).sum()
# true positives: labels equal to prediction and to label_id
true_positive = torch.logical_and(torch.eq(labels, predictions),
torch.eq(labels, label_id)).sum().float()
# precision for label
precision = torch.div(true_positive, torch.eq(predictions, label_id).sum().float())
# replace nan values with 0
precision = torch.where(torch.isnan(precision),
torch.zeros_like(precision).type_as(true_positive),
precision)
# recall for label
recall = torch.div(true_positive, true_count)
# f1
f1 = 2 * precision * recall / (precision + recall)
# replace nan values with 0
f1 = torch.where(torch.isnan(f1), torch.zeros_like(f1).type_as(true_positive), f1)
return f1, true_count
def __call__(self, predictions: torch.Tensor, labels: torch.Tensor) -> torch.Tensor:
"""
Calculate f1 score based on averaging method defined in init.
Args:
predictions: tensor with predictions
labels: tensor with original labels
Returns:
f1 score
"""
# simpler calculation for micro
if self.average == 'micro':
return self.calc_f1_micro(predictions, labels)
f1_score = 0
for label_id in range(1, len(labels.unique()) + 1):
f1, true_count = self.calc_f1_count_for_label(predictions, labels, label_id)
if self.average == 'weighted':
f1_score += f1 * true_count
elif self.average == 'macro':
f1_score += f1
if self.average == 'weighted':
f1_score = torch.div(f1_score, len(labels))
elif self.average == 'macro':
f1_score = torch.div(f1_score, len(labels.unique()))
return f1_score您可以通过以下方式进行测试:
from sklearn.metrics import f1_score
import numpy as np
errors = 0
for _ in range(10):
labels = torch.randint(1, 10, (4096, 100)).flatten()
predictions = torch.randint(1, 10, (4096, 100)).flatten()
labels1 = labels.numpy()
predictions1 = predictions.numpy()
for av in ['micro', 'macro', 'weighted']:
f1_metric = F1Score(av)
my_pred = f1_metric(predictions, labels)
f1_pred = f1_score(labels1, predictions1, average=av)
if not np.isclose(my_pred.item(), f1_pred.item()):
print('!' * 50)
print(f1_pred, my_pred, av)
errors += 1
if errors == 0:
print('No errors!')https://stackoverflow.com/questions/62265351
复制相似问题