我正在UCI银行营销数据集上训练一个支持向量机,这是一家额外的. SVM银行.由于数据有偏差,我也对召回感兴趣。我的准确率约为87.95%,但我的召回率约为51%。我想知道如何在不降低准确率的情况下只使用支持向量机来提高召回率。
我的代码:
from sklearn.svm import SVC
svm_clf = SVC(gamma="auto",class_weight={1: 2.6})
svm_clf.fit(X_transformed, y_train_binary.ravel())更多信息:
我没有创建任何新的功能(即组合功能),并认为未知的标签。我还按照属性信息的建议删除了工期属性。
我尝试过不同的class_weights,所以我可以提高召回率高达75.32%,但我的准确率下降到68%,如何在不降低准确率的情况下提高召回率?
发布于 2020-04-26 14:50:10
复制是RandomOverSampling,在OverSampling中帮助不大。
我很快做了一个RandomUnderSampling.The评分看上去不错的一个基线,以提高。
我没有为改进模型做任何事情。
我的Google Colab的代码-
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00222/bank-additional.zip'
from urllib.request import urlretrieve
urlretrieve(url, "data.zip")
from zipfile import ZipFile
file_name = "/content/data.zip"
with ZipFile(file_name, 'r') as zip:
zip.extractall()
import numpy as np,pandas as pd
data = pd.read_csv("/content/bank-additional/bank-additional-full.csv",delimiter=";")
X = data.iloc[:,:-1]
y = data.iloc[:,-1]
y.value_counts()
X_cat = X.select_dtypes(include='object')
from sklearn.preprocessing import LabelEncoder
lbe = LabelEncoder()
for colname in X_cat.columns:
X_cat[colname] = lbe.fit_transform(X_cat[colname])
X[colname] = X_cat[colname]
y = lbe.fit_transform(y)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X,y,test_size=0.20,random_state=201, stratify=y)
from imblearn.under_sampling import RandomUnderSampler
rand = RandomUnderSampler(sampling_strategy=.6)
x_train, y_train = rand.fit_resample(x_train, y_train)
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=200,max_samples=0.05)
model.fit(x_train, y_train)
from sklearn.metrics import accuracy_score
y_pred_train = model.predict(x_train)
####Metrics on train
from sklearn.metrics import confusion_matrix
tn, fp, fn, tp = confusion_matrix(y_train, y_pred_train).ravel() # (55148, 1715, 8,90)
print("Training",fp/(tn+fp),fn/(fn+tp), accuracy_score(y_train, y_pred_train), tn, fp, fn, tp)
####Metrics on test
y_pred = model.predict(x_test)
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel() # (55148, 1715, 8,90)
print("Test",fp/(tn+fp),fn/(fn+tp), accuracy_score(y_test, y_pred), tn, fp, fn, tp)
from sklearn.metrics import recall_score
print("Test-recall",recall_score(y_test, y_pred))下一个-
发布于 2023-02-18 22:03:13
一种选择是超参数转换。与scikit学习支持向量机模型性能相关的主要超参数是:
C正则化强度kernel -将数据投影到特征空间class_weight --如果数据不平衡,建议的是class_weight='balanced'https://datascience.stackexchange.com/questions/72845
复制相似问题