首页
学习
活动
专区
圈层
工具
发布
    • 综合排序
    • 最热优先
    • 最新优先
    时间不限
  • 来自专栏自然语言处理

    Kaggle系列-IEEE-CIS Fraud Detection第一名复现

    Magic Feature https://www.kaggle.com/cdeotte/xgb-fraud-with-magic-0-9600 # frequency encode def encode_FE

    86930发布于 2021-01-21
  • 来自专栏AI智能体从入门到实践

    构建AI智能体:金融风控系统:基于信息熵和KL散度的异常交易检测

    * np.log2(fraud_prob + 1e-10) + (1-fraud_prob) * np.log2(1-fraud_prob + 1e-10 = self.risk_system.df[self.risk_system.df['is_fraud'] == 1].corr()['is_fraud'].drop('is_fraud') normal_corr = self.risk_system.df[self.risk_system.df['is_fraud'] == 0].corr()['is_fraud'].drop('is_fraud 真阳性 fp = np.sum(predicted_fraud & ~actual_fraud) # 假阳性 tn = np.sum(~predicted_fraud & ~actual_fraud) # 真阴性 fn = np.sum(~predicted_fraud & actual_fraud) # 假阴性

    43210编辑于 2025-12-23
  • 来自专栏VoiceVista语音智能

    Buy Now Pay Later, But At What Price? A Case for Face Biometrics

    With the rise of shopping comes a rise in fraud. Experian’s 2022 Future of Fraud Forecast highlights BNPL, Romance, and cryptocurrency schemes on the Without the right identity verification and fraud mitigation tools in place, fraudsters will take advantage Experian predicts BNPL lenders will see an uptick in two types of fraud: identity theft and synthetic identity fraud, when a fraudster uses a combination of real and fake information to create an entirely

    37420编辑于 2022-05-17
  • 来自专栏大数据学习笔记

    信用卡反欺诈

    data['Amount_max_fraud'] = 1 data.loc[data.Amount <= 2125.87, 'Amount_max_fraud'] = 0 f, (ax1, ax2) . #0.172% of transactions were fraud. Fraud = data[data.Fraud == 1] Normal = data[data.Normal == 1] # Set X_train equal to 80% of the fraudulent X_train = X_train.drop(['Fraud','Normal'], axis = 1) X_test = X_test.drop(['Fraud','Normal'], axis = :', ratio) y_train.Fraud *= ratio y_test.Fraud *= ratio print('训练数据的数量:\n', y_train.Fraud) print('测试数据的数量

    2K30发布于 2019-07-02
  • 来自专栏全栈程序员必看

    How AHI Fintech and DataVisor are Securing Data through AI and Big Data

    However, we can produce tags can only after we have suffered a fraud attack. Additionally, it can track new fraud methods, and constantly adapt to an ever-changing environment to created anti-fraud machine learning models. Therefore, to a certain extent, you can say that risk control and anti-fraud work are universal. Therefore, the cost of committing fraud is comparatively low.

    50810编辑于 2022-07-21
  • 来自专栏机器学习/数据可视化

    极不均衡样本的信用卡欺诈分析

    步骤 确定数据不平衡度是多少:通过value_counts()来统计,查看每个类别的数量和占比 在本例中一旦我们确定了fraud的数量,我们就需要将no-fraud的数量采样和其相同,形成50%:50% 取出欺诈的数据,同时从非欺诈中取出相同长度的数据: # 欺诈的数据 fraud_df = df[df["Class"] == 1] # 从非欺诈的数据中取出相同的长度len(fraud_df) no_fraud_df = df[df["Class"] == 0][:len(fraud_df)] # 492+492 normal_distributed_df = pd.concat([fraud_df, no_fraud_df 值越大,结果越可能出现fraud 负相关:特征V17, V14, V12 和 V10 是负相关的;值越小,结果越可能出现fraud 箱型图 In [32]: 负相关的特征箱型图 # 负相关的数据 # 生成 = new_df["V14"].loc[new_df["Class"] == 1] q1, q3 = v14_fraud.quantile(0.25), v14_fraud.quantile(0.75

    74530编辑于 2023-08-25
  • 来自专栏活动

    [AI学习笔记]DeepSeek 在金融风控中的特征工程实践

    ', description='Features for credit card fraud detection', tags =['fraud', 'credit_card'])# 查询特征registered_features = fr.get_features('fraud_detection_features', version ='1.0')# 特征血缘图lineage_graph = fr.get_lineage('fraud_detection_features')lineage_graph.visualize()3.4 100, 'max_depth': 5})# 绑定特征到模型mt.bind_features(features)# 训练模型training_metrics = mt.train(target='is_fraud ', validation_split=0.2)# 保存模型mt.save('fraud_detection_model', version='1.0

    1.3K22编辑于 2025-04-01
  • 来自专栏AI SPPECH

    智能金融风控中的大模型实践_02

    = '低风险' action = '通过' return { 'fraud_score': fraud_score, 'fraud_risk_level': fraud_risk_level, 'action': action, 'z_score' (figsize=(10, 6)) plt.bar(['欺诈风险评分'], [fraud_detection['fraud_score']], color='red' if fraud_detection ['fraud_risk_level'] == '高风险' else 'orange' if fraud_detection['fraud_risk_level'] == '中风险' else 'green ') plt.ylim(0, 1) plt.title(f'交易欺诈检测结果 - {fraud_detection['fraud_risk_level']

    73510编辑于 2025-11-13
  • 来自专栏DeepHub IMBA

    使用分类权重解决数据不平衡的问题

    '] = performance_df['Amount']*performance_df['Actual'] performance_df['fraud_prevented'] = performance_df ['fraud_amount']*performance_df['Pred'] performance_df['fraud_realized'] = performance_df['fraud_amount '] - performance_df['fraud_prevented'] financial_recall = (performance_df['fraud_prevented'] .sum() / (performance_df['fraud_prevented'].sum() + performance_df['fraud_realized'].sum()))*100 在这种情况下,我们可以像这样向class_weight传递一个字典: fraud_class_weights = {0:1, 1:10} 但是sklearn API实际上使这个过程更容易。

    73610编辑于 2022-11-11
  • 来自专栏翻译scikit-learn Cookbook

    Using linear methods for classification – logistic regression

    The canonical example is fraud detection, where most transactions aren't fraud, but the cost associated cases than non-fraud cases; this could be due to a business rule, so we might alter how we weigh the However, because we care more about fraud cases, let's oversample the fraud relative to nonfraud cases Put in the context of the problem, if the estimated cost associated with fraud is sufficiently large, it can eclipse the cost associated with tracking fraud.

    50210发布于 2019-11-18
  • 来自专栏算法进阶

    异常检测算法速览(Python代码)

    = np.sum(d['Class'] == 1) plt.bar(['Fraud', 'non-fraud'], [num_fraud, num_nonfraud], color='dodgerblue , axis=1) mse_fraud = np.mean(np.power(X_fraud - pred_fraud, 2), axis=1) mae_test = np.mean(np.abs(X_test - pred_test), axis=1) mae_fraud = np.mean(np.abs(X_fraud - pred_fraud), axis=1) mse_df = pd.DataFrame mse_fraud]) mse_df['MAE'] = np.hstack([mae_test, mae_fraud]) mse_df = mse_df.sample(frac=1).reset_index # 画出MSE、MAE散点图 markers = ['o', '^'] colors = ['dodgerblue', 'coral'] labels = ['Non-fraud', 'Fraud']

    1.5K30编辑于 2022-06-01
  • 来自专栏计算机与AI

    如何修复不平衡的数据集

    下面的代码显示了一种简单的方法: # Shuffle the Dataset. shuffled_df = credit_df.sample(frac=1,random_state=4) # Put all the fraud class in a separate dataset. fraud_df = shuffled_df.loc[shuffled_df['Class'] == 1] #Randomly select 492 observations from the non-fraud (majority class) non_fraud_df = shuffled_df.loc[shuffled_df['Class == 0].sample(n=492,random_state=42) # Concatenate both dataframes again normalized_df = pd.concat([fraud_df , non_fraud_df]) #plot the dataset after the undersampling plt.figure(figsize=(8, 8)) sns.countplot(

    1.7K10发布于 2020-11-19
  • 来自专栏拓端tecdat

    python关联规则学习:FP-Growth算法对药品进行“菜篮子”分析

    Cannabis’, ‘Stimulants’, ‘Hash’] Packstation24 [‘Accounts’, ‘Benzos’, ‘IDs & Passports’, ‘SIM Cards’, ‘Fraud Stimulants’, ‘Prescription’, ‘Sildenafil Citrate’] OzVendor [‘Software’, ‘Erotica’, ‘Dumps’, ‘E-Books’, ‘Fraud ‘Stimulants’] [‘MDMA’] 310 0.768 [‘Speed’, ‘Weed’, ‘Stimulants’] [‘Cannabis’, ‘Ecstasy’] 68 0.623 [‘Fraud ’, ‘Hacking’] [‘Accounts’] 53 0.623 [‘Fraud’, ‘CC & CVV’, ‘Accounts’] [‘Paypal’] 43 0.492 [‘Documents

    86710发布于 2020-12-30
  • 来自专栏CDA数据分析师

    行业案例 | 数据分析在银行业应用之欺诈检测

    creditcard_data['V3']<-5), 1, 0) print(pd.crosstab(creditcard_data['Class'], creditcard_data['flag_as_fraud '], rownames=['Actual Fraud'], colnames=['Flagged Fraud'])) Flagged Fraud 0 1 Actual Fraud y_train) predictions = lr.predict(X_test) print(pd.crosstab(y_test, predictions, rownames=['Actual Fraud '], colnames=['Flagged Fraud'])) Flagged Fraud 0.0 1.0 Actual Fraud 0.0 1504 '], colnames=['Flagged Fraud'])) Flagged Fraud 0.0 1.0 Actual Fraud 0.0 1496

    1.2K20编辑于 2022-04-15
  • 来自专栏数据派THU

    独家 | 一文教你如何处理不平衡数据集(附代码)

    这里 https://github.com/wmlba/innovate2019/blob/master/Credit_Card_Fraud_Detection.ipynb 一、 重采样(过采样和欠采样 一个简单实现代码如下: # Shuffle the Dataset. shuffled_df = credit_df.sample(frac=1,random_state=4) # Put all the fraud class in a separate dataset. fraud_df = shuffled_df.loc[shuffled_df['Class'] == 1] #Randomly select 492 observations from the non-fraud (majority class) non_fraud_df=shuffled_df.loc[shuffled_df['Class' , non_fraud_df]) #plot the dataset after the undersampling plt.figure(figsize=(8, 8)) sns.countplot('

    1.3K20发布于 2019-05-31
  • 来自专栏数据科学人工智能

    精品教学案例 | 金融诈骗数据分析与预测

    df_total['type'].unique() df_fraud = df_total[df_total['isFraud']==1] df_fraud['type'].unique() 我们分别统计正常交易和诈骗交易在不同交易类型下的数量 df_fraudTransfer = df_fraud[df_fraud['type'] == 'TRANSFER'] df_fraudCashout = df_fraud[df_fraud['type [(X_fraud['oldbalanceDest'] == 0) & (X_fraud['newbalanceDest'] == 0) & (X_fraud['amount']! print('交易前后来源方余额都是0,而这笔交易本身不为0情况下') print('金融诈骗中发生的比率:\t {}'.\ format(len(X_fraud.loc[(X_fraud['oldbalanceOrg '] == 0) & (X_fraud['newbalanceOrig'] == 0) & (X_fraud['amount']!

    2.6K30发布于 2020-05-19
  • 来自专栏公共互联网反网络钓鱼(APCN)

    税务欺诈“十二大骗局”的数字化演进与智能防御体系构建

    = np.random.choice(n_samples, 50, replace=False)data['refund_amount'][fraud_indices] = np.random.uniform '][fraud_indices] = np.random.choice([0, 1, 2, 3, 23], 50)data['device_entropy'][fraud_indices] = np.random.uniform (0.01, 0.1, 50)data['ip_distance'][fraud_indices] = np.random.uniform(500, 2000, 50)data['typing_variance '][fraud_indices] = np.random.uniform(0.01, 0.05, 50)df = pd.DataFrame(data)# 1. ring in fraud_rings: for node_idx in ring: df.loc[node_idx, 'in_fraud_ring'] = True# 综合风险评分

    31120编辑于 2026-03-11
  • 来自专栏不能显示专栏创建者

    随着消费者欺诈行为的增加,网上商业欺诈行为减少

    11 to May 18) to the reopening phase (May 19-July 25), noted the quarterly report on global online fraud almost overnight, fraudsters tried to take advantage," Shai Cohen, senior vice president of global fraud As those businesses ramped up their digital fraud prevention solutions, he continued, the fraudsters

    47100发布于 2021-01-07
  • 来自专栏机器学习/数据可视化

    常用机器学习代码汇总

    sns.distplot(v12_fraud, ax=ax2, fit=norm, color="#56F9BB") ax2 .set_title("V12", fontsize=14) v10_fraud = new_df["V10"].loc[new_df["Class"] == 1].values sns.distplot (v10_fraud, ax=ax3, fit=norm, color="#C5B3F9") ax2.set_title(" = df[df["Class"] == 1] # 少量数据 # 从非欺诈的数据中取出相同的长度len(fraud_df) no_fraud_df = df[df["Class"] == 0][:len (fraud_df)] # 组合 normal_distributed_df = pd.concat([fraud_df, no_fraud_df]) # 随机打乱数据 new_df = normal_distributed_df.sample

    74520编辑于 2023-08-25
  • 来自专栏数据科学与人工智能

    学习| 如何处理不平衡数据集

    一种简单的方法如下面的代码所示: # Shuffle the Dataset. shuffled_df = credit_df.sample(frac=1,random_state=4) # Put all the fraud class in a separate dataset. fraud_df = shuffled_df.loc[shuffled_df['Class'] == 1] #Randomly select 492 observations from the non-fraud (majority class) non_fraud_df = shuffled_df.loc[shuffled_df['Class == 0].sample(n=492,random_state=42) # Concatenate both dataframes again normalized_df = pd.concat([fraud_df , non_fraud_df]) #plot the dataset after the undersampling plt.figure(figsize=(8, 8)) sns.countplot(

    2.4K40发布于 2019-05-16
领券