我想做的是:
如果用户A是Fraudster.
表:

表中有500 K行。
我的代码:
import mysql.connector
from mysql.connector import Error
import pandas as pd
try:
connection = mysql.connector.connect(host='localhost',
database='database',
user='root',
password='')
cursor = connection.cursor()
df_chunk = pd.read_sql("select * from tableuser",con=connection,chunksize=1000000)
chunk_list = []
for chunk in df_chunk:
chunk_list.append(chunk)
df= pd.concat(chunk_list)
def expand_fraud(no_fraud, fraud, col_name):
t = pd.merge(no_fraud, fraud, on=col_name)
if len(t):
df.loc[df.ID.isin(t.ID_x), "IsFraudsterStatus"] = 1
return True
return False
while True:
added_fraud = False
fraud = df[df.IsFraudsterStatus == 1]
no_fraud = df[df.IsFraudsterStatus == 0]
added_fraud |= expand_fraud(no_fraud, fraud, "DeviceId")
added_fraud |= expand_fraud(no_fraud, fraud, "Email")
added_fraud |= expand_fraud(no_fraud, fraud, "MobileNo")
if not added_fraud:
break
print(df)
except Error as e:
print("Error reading data from MySQL table", e)
finally:
if (connection.is_connected()):
connection.close()
cursor.close()
print("MySQL connection is closed")

上一次我在使用read_sql时遇到了同样的问题,Chunksize解决了这个问题。如何在Dataframe中使用块大小?
发布于 2020-07-24 03:11:52
不知道你为什么要对付大块。以下是拟议的守则:
df['same_device_id'] = 0
fraud_devices = df[df.IsFraudsterStatus == 1]['DeviceId']
for device_id in fraud_devices:
df[df.device_id == device_id]['same_device_id'] = 1为每个额外的公共字段值添加一列。一旦完成,您的欺诈者应该使用FraudsterStatus和新字段上的或运算符来确定。
https://stackoverflow.com/questions/63066085
复制相似问题