
我的场景:
递归CTE (当我将数据增加到1000行时出错)
代码:
with recursive cte as (
select ID, Email, MobileNo, DeviceId, IPAddress, id as tracking
from tableuser
where isfraudsterstatus = 1
union all
select u.id, u.email, u.mobileno, u.deviceid, u.ipaddress , concat_ws(',', cte.tracking, u.id)
from cte join
tableuser u
on u.email = cte.email or
u.mobileno = cte.mobileno or
u.deviceid = cte.deviceid or
u.ipaddress = cte.ipaddress
where find_in_set(u.id, cte.tracking) = 0
)
select *
from cte;错误:

使用Pandas的:(当我将数据从1000行增加到50万行时出错)
代码:
import mysql.connector
from mysql.connector import Error
import pandas as pd
#DATABASE CONNECTION
##
try:
connection = mysql.connector.connect(host='localhost',
database='database',
user='root',
password='')
cursor = connection.cursor()
#Create Dataframe (temporary data)
#df = pd.read_sql("select * from MOCK_DATA",con=connection)
df = pd.read_sql("select * from tableuser",con=connection)
##
def expand_fraud(no_fraud, fraud, col_name):
t = pd.merge(no_fraud, fraud, on=col_name)
if len(t):
df.loc[df.ID.isin(t.ID_x), "IsFraudsterStatus"] = 1
return True
return False
while True:
added_fraud = False
fraud = df[df.IsFraudsterStatus == 1]
no_fraud = df[df.IsFraudsterStatus == 0]
added_fraud |= expand_fraud(no_fraud, fraud, "DeviceId")
added_fraud |= expand_fraud(no_fraud, fraud, "Email")
added_fraud |= expand_fraud(no_fraud, fraud, "MobileNo")
if not added_fraud:
break
print(df)
Id_list = df.values.tolist()
except Error as e:
print("Error reading data from MySQL table", e)
finally:
if (connection.is_connected()):
connection.close()
cursor.close()
print("MySQL connection is closed")误差

怎么处理这事?有什么别的办法吗?
发布于 2020-07-17 10:50:16
这里的问题(对于MySQL部分)似乎是您的停止条件。您可以跟踪ids列表以防止无限循环(例如,A,B,C,D)。不幸的是,该列的数据类型为"id",可能是varchar(10),这实际上意味着您的轨道列表有一个有限的长度。
如果达到此深度,您应该会得到一条错误消息:
Error Code: 1406. Data too long for column 'tracking' at row 1同样不幸的是,您可能通过禁用严格模式 (例如使用sql_mode='' )来抑制该错误,这是解决某些问题的常见方法(最臭名昭著的是-errors),而不是修复代码,但其副作用是您可能会得到无效的数据。
在您的示例中,这会导致tracking值停止跟踪(而不会抛出错误),例如,varchar(10)可能以A,B,C,D,E,结束,无法将F添加到列表中,因此它一直将F添加到结果集,从而导致无限循环。
MySQL实际上有一个防止无限循环的保护,所以您可能会
Error Code: 3636. Recursive query aborted after 1001 iterations.
Try increasing @@cte_max_recursion_depth to a larger value.但是它只在特定情况下保护您,就像每次迭代添加多于一行,然后每次迭代都添加多行一样,您将在结果集中的2^1000行之前达到资源限制(或超时)。
怎么修呢?
如果您实际上不需要来自跟踪器的信息(而且您的熊猫代码没有这样做,似乎您添加它只是为了防止循环),您可以让MySQL使用union distinct来处理重复的信息。
with recursive cte as (
select ID, Email, MobileNo, DeviceId, IPAddress
from tableuser
where isfraudsterstatus = 1
union distinct -- distinct!
select u.id, u.email, u.mobileno, u.deviceid, u.ipaddress
from cte join tableuser u
on u.email = cte.email or u.mobileno = cte.mobileno
or u.deviceid = cte.deviceid or u.ipaddress = cte.ipaddress
)
select * from cte;如果你愿意的话,你也可以把它扩展到追踪“原始欺诈者”。如果每个链中有几个欺诈者(例如A和B都标记为欺诈者,而A具有与B相同的MobileNo ),这可能会导致重复,但您可以使用group by再次消除这些欺诈者。
with recursive cte as (
select ID, Email, MobileNo, DeviceId, IPAddress, id as original_fraudster
from tableuser
where isfraudsterstatus = 1
union distinct
select u.id, u.email, u.mobileno, u.deviceid, u.ipaddress,
cte.original_fraudster
from cte join tableuser u
on u.email = cte.email or u.mobileno = cte.mobileno
or u.deviceid = cte.deviceid or u.ipaddress = cte.ipaddress
)
select ID, Email, MobileNo, DeviceId, IPAddress,
min(original_fraudster) as original_fraudster
from cte
group by ID, Email, MobileNo, DeviceId, IPAddress;从技术上讲,您还可以通过显式定义自己的长度来避免原来的问题( "id“列的有限长度)。
with recursive cte as (
select ID, Email, MobileNo, DeviceId, IPAddress,
cast(id as char(1000)) as tracking 虽然这只是将问题转移到将来的某一时刻,而这段时间也可能不够长,但你可以判断这是否是一个潜在的问题。
https://stackoverflow.com/questions/62948946
复制相似问题