我从MSSQL获得了一个包含500万行的表,当我获取这个表的所有行时,这需要2~3分钟。我想(如果可能的话)对此进行优化。
这是我的密码:
cursor.execute("SELECT * FROM MyTable")
rows = cursor.fetchall() # that takes 2~3 minutes
# some code for setup the output that take only few seconds我已经试过了,想用:
while True:
rows = cursor.fetchmany(500000)
if not rows:
break
# Do some stuff也用得起酮。
但是,我又一次在2-3分钟之间:/如何优化它?也许是用线,但我不知道怎么用。
谢谢你的帮助。
发布于 2022-05-23 15:19:29
我认为您可以限制查询返回的行数,即使您必须对数据库进行多次调用。
关于线程,您有几种解决方案:
无论如何,您需要一个ThreadedConnectionPool。下面是一个很小的例子,说明了一种方法
import psycopg2
from psycopg2 import pool
from threading import Thread
from time import sleep
threaded_connection_pool = None
thread_table = list()
def get_new_connection():
global threaded_postgreSQL_pool
connection = None
while not isinstance(connection, psycopg2.extensions.connection):
try:
connection = threaded_postgreSQL_pool.getconn()
except pool.PoolError:
sleep(10) # Wait a free connection
return connection, connection.cursor()
def thread_target():
connection, cursor = get_new_connection()
with connection, cursor:
# Do some stuff
pass
threaded_connection_pool = psycopg2.pool.ThreadedConnectionPool(
# YOUR PARAM
)
for counter_thread in range(10):
thread = Thread(
target=thread_target,
name=f"Thread n°{counter_thread}"
)
thread_table.append(thread)
thread.start()
#
# Do many more stuff
#
for thread in thread_table:
thread.join()
# End发布于 2022-05-23 17:33:30
我更喜欢使用第一种解决方案:“单个连接,但每个线程使用不同的游标”
为了这个:我必须做这样的事?
result = []
cursor = connection.cursor()
def fetch_cursor(cursor):
global result
rows = cursor.fetchall()
if rows:
result += beautify_output(rows)
######### THIS CODE BELOW IS INSIDE A FUNCTION ######
thread_table = []
limit = 1000000
offset = 0
sql = "SELECT * FROM myTABLE"
while True:
try:
cursor.execute(f"{sql} LIMIT {limit} OFFSET {offset}")
except Exception as e:
break
offset += limit
thread = Thread(target=fetch_cursor, args=(cursor,))
thread_table.append(thread)
thread.start()
for thread in thread_table:
thread.join()
print(result) 所以这样的事应该有效吗?(我明天试一试)
https://stackoverflow.com/questions/72349978
复制相似问题