My场景:
(fraudster).
比较
表:

我的代码:
import mysql.connector
from mysql.connector import Error
import pandas as pd
try:
connection = mysql.connector.connect(host='localhost',
database='database',
user='root',
password='')
cursor = connection.cursor()
df = pd.read_sql("select * from tableuser",con=connection)
##
def expand_fraud(no_fraud, fraud, col_name):
t = pd.merge(no_fraud, fraud, on=col_name)
if len(t):
df.loc[df.ID.isin(t.ID_x), "IsFraudsterStatus"] = 1
return True
return False
while True:
added_fraud = False
fraud = df[df.IsFraudsterStatus == 1]
no_fraud = df[df.IsFraudsterStatus == 0]
added_fraud |= expand_fraud(no_fraud, fraud, "DeviceId")
added_fraud |= expand_fraud(no_fraud, fraud, "Email")
added_fraud |= expand_fraud(no_fraud, fraud, "MobileNo")
if not added_fraud:
break
print(df)
except Error as e:
print("Error reading data from MySQL table", e)
finally:
if (connection.is_connected()):
connection.close()
cursor.close()
print("MySQL connection is closed")使用1000行。现在,我添加了500 k行,并弹出了错误消息。
错误:

我也为此使用了递归CTE,但是它只处理少量的数据。你能帮帮我吗?向我建议另一种存档方法。预计行数为400万用户。当用户单击“支付”按钮时,系统将进行验证,这意味着执行时间也是必要的。
发布于 2020-07-20 07:33:46
您可以使用Pandas函数isin创建与欺诈者共享特征的用户数据。
例如,要让所有拥有与欺诈者相同的电话号码的用户:
fraudsters = df[df["isFraudsterStatus" == 1]
filter = df["MobileNo"].isin(fraudsters["MobileNo"])
non_fraud_with_same_mobile = df[filter]https://stackoverflow.com/questions/62990301
复制相似问题