首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何将BinaryRelevance.predict结果转换为标签名称?

如何将BinaryRelevance.predict结果转换为标签名称?
EN

Stack Overflow用户
提问于 2021-07-26 04:24:51
回答 1查看 46关注 0票数 2

我已经创建了一个小示例,使用skmultilearn尝试进行多标签文本分类:

代码语言:javascript
复制
import skmultilearn
from sklearn.feature_extraction.text import TfidfVectorizer
import pandas as pd
from scipy.sparse import csr_matrix
from pandas.core.common import flatten
from sklearn.naive_bayes import MultinomialNB
from skmultilearn.problem_transform import BinaryRelevance

TRAIN_DATA = [
     ['How to connect to MySQL using PHP ?', ['development','database']],
     ['What are the best VPN clients these days?', ['networks']],
     ['What is the equivalent of the boolean type in Oracle?', ['database']],
     ['How to remove unwanted entity from Hibernate session?', ['development']],
     ['How to implement TCP connection pooling in java?', ['development','networks']],
     ['How can I connect to PostgreSQL database remotely from another network?', ['database','networks']],
     ['What is the python function to remove accents in a string?', ['development']],
     ['How to remove indexes in SQL Server?', ['database']],
     ['How to configure firewall with DMZ?', ['networks']]
] 
data_frame = pd.DataFrame(TRAIN_DATA, columns=['text','labels'])
corpus = data_frame['text']
unique_labels = set(flatten(data_frame['labels']))
for u in unique_labels:
    data_frame[u] = 0
    data_frame[u] = pd.to_numeric(data_frame[u])
for i, row in data_frame.iterrows():
    for u in unique_labels:
        if u in row.labels:
            data_frame.at[i,u] = 1
tfidf = TfidfVectorizer()
Xfeatures = tfidf.fit_transform(corpus).toarray()
y = data_frame[unique_labels]
binary_rel_clf = BinaryRelevance(MultinomialNB())
binary_rel_clf.fit(Xfeatures,y)
predict_text = ['SQL Server and PHP?']
X_predict = tfidf.transform(predict_text)
br_prediction = binary_rel_clf.predict(X_predict)
print(br_prediction)

但是,结果类似于:

代码语言:javascript
复制
(0, 1)  1.

有没有办法将这个结果转换成标签名称,比如['development','database']

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-07-26 05:45:57

BinaryRelevance估计器的返回类型是scipy csc_matrix。您可以执行以下操作:

首先,将csc_matrix转换为bool类型的密集numpy数组

代码语言:javascript
复制
br_prediction = br_prediction.toarray().astype(bool)

然后,使用转换后的预测作为y的可能标签名称的掩码

代码语言:javascript
复制
predictions = [y.columns.values[prediction].tolist() for prediction in br_prediction]

这将把每个预测映射到相应的标签。例如:

代码语言:javascript
复制
print(y.columns.values)
# output: ['development' 'database' 'networks']

print(br_prediction)
# output: (0, 1)    1

br_prediction = br_prediction.toarray().astype(bool)
print(br_prediction)
# output: [[False True False]]

predictions = [y.columns.values[prediction].tolist() for prediction in br_prediction]
print(predictions)
# output: [['database']]
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68522255

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档