下面是几个月前我一直在编写的代码片段,但现在才需要它。我相信它的主要部分是一些代码,我从一个所以的帖子,但我失去了URL。不管怎样,我已经忘记了,当成千上万的文件涉及到时,它是多么缓慢,所以我正在研究使它更快的方法。
我已经尝试过移动代码的一部分,并且删除了某些部分,但是性能要么保持不变,要么变得更糟,这使我相信问题出在os.listdir命令中。据我所读,os.listdir在这里是最快的选择,因为它执行的系统调用不像扫描或步行那样多,但是它的性能仍然很糟糕,文件夹超过100000个文件,如下所示。
14387 files in 2794 folders processed in 5.88s
14387 files in 2794 folders processed in 3.224s
14387 files in 2794 folders processed in 5.847s
110016 files in 21440 folders processed in 22.732s
110016 files in 21440 folders processed in 22.603s
110016 files in 21440 folders processed in 41.055s
249714 files in 35707 folders processed in 66.452s
249714 files in 35707 folders processed in 49.154s
249714 files in 35707 folders processed in 88.43s
249714 files in 35707 folders processed in 48.942s我目前正在研究另一种方法,使用静态文本文件索引文件/文件夹位置,该文件每小时在服务器上预先填充最新的文件夹内容,但在放弃下面的代码之前,我想请求帮助,看看是否可以使代码制作得更快,或者代码是否处于极限状态。
import tkinter as tk
import tkinter.ttk as ttk
from ttkwidgets import CheckboxTreeview
import os
import time
time_start = time.time()
iid = 1 # IID of tree item. 0 is top level parent
count_folders = 0 # Number of folders in parent
count_files = 0 # Number of files in parent
compare_check = {} # Build the dictionary with IID key and folder/file paths in list
root = tk.Tk()
root.geometry('850x450')
style = ttk.Style(root)
v_scrollbar = tk.Scrollbar(root, orient='vertical')
v_scrollbar.place(x=830, y=20, width=20, height=415)
tree = CheckboxTreeview(root, show='tree', yscrollcommand=v_scrollbar.set)
tree.place(x=10, y=20, anchor="nw", width=815, height=415)
v_scrollbar.config(command=tree.yview)
style.configure('Treeview', indent=15)
def new_folder(parent_path, directory_entries, parent_iid):
global iid, count_folders, count_files
for name in directory_entries:
item_path = parent_path + os.sep + name
if os.path.isdir(item_path):
subdir_iid = tree.insert(parent=parent_iid, index='end', text=f'[F] {name}')
try:
subdir_entries = os.listdir(item_path)
new_folder(parent_path=item_path, directory_entries=subdir_entries, parent_iid=subdir_iid)
count_folders += 1 # for testing
except PermissionError:
pass
else:
tree.insert(parent=parent_iid, index='end', text=f'[f] {name}')
count_files += 1 # for testing
# The iid of the tree item is returned as hex value
iid += 1
hex_iid = hex(iid)
hex_of_folder_file = str(hex_iid)[2:].upper() # Omit the 0x of the hex value
hex_compare = hex_of_folder_file
# For the external app searching function we need to prefix the given iid hex value with an 'I'
if len(hex_compare) >= 3:
hex_compare = 'I' + str(hex_of_folder_file)
elif len(hex_compare) == 2:
hex_compare = 'I0' + str(hex_of_folder_file)
elif len(hex_compare) == 1:
hex_compare = 'I00' + str(hex_of_folder_file)
iid = int(hex_iid, 16) # Convert back to decimal to continue the iid increment count
compare_check.update({hex_compare: [parent_path, parent_path[14:], name]}) # Update dictionary with current item
parent_iid = tree.insert(parent='', index='0', text='All Documents', open=True)
start_path = os.path.expanduser(r"K:/DMC Processed - 02072017") # Path for test
start_dir_entries = os.listdir(start_path)
new_folder(parent_path=start_path, directory_entries=start_dir_entries, parent_iid=parent_iid)
time_end = time.time()
time_total = round(time_end - time_start, 3) # for testing. Simple start to end timer result
ttk.Label(root, text=f"Files: {count_files} || Folders: {count_folders} || Time: {time_total}s", font='arial 10 bold').place(x=300, y=0) # for testing
print(f"{count_files} files in {count_folders} folders processed in {time_total}s") # for testing
root.mainloop()发布于 2022-08-02 13:12:17
由于你很好地设置了它的时间,我认为这将是一个有趣的挑战,尝试一下。
我试图重写它以使用os.walk,但是我认为您的os.path.isdir()调用会非常慢,所以我用scandir换掉了它。结果发现这是我能找到的最快的方法。
基准:
original: 697665 files in 76729 folders processed in 106.079s
os.scandir: 697665 files in 76729 folders processed in 23.152s
os.walk: 697665 files in 76731 folders processed in 32.869s使用scandir模块似乎没有什么不同,现在似乎已经很好地优化了os。
下面是包含其他函数的代码:
import tkinter as tk
import tkinter.ttk as ttk
from ttkwidgets import CheckboxTreeview
import os
import scandir
import time
time_start = time.time()
iid = 1 # IID of tree item. 0 is top level parent
count_folders = 0 # Number of folders in parent
count_files = 0 # Number of files in parent
compare_check = {} # Build the dictionary with IID key and folder/file paths in list
root = tk.Tk()
root.geometry('850x450')
style = ttk.Style(root)
v_scrollbar = tk.Scrollbar(root, orient='vertical')
v_scrollbar.place(x=830, y=20, width=20, height=415)
tree = CheckboxTreeview(root, show='tree', yscrollcommand=v_scrollbar.set)
tree.place(x=10, y=20, anchor="nw", width=815, height=415)
v_scrollbar.config(command=tree.yview)
style.configure('Treeview', indent=15)
def new_folder(parent_path, directory_entries, parent_iid):
global iid, count_folders, count_files
for name in directory_entries:
item_path = parent_path + os.sep + name
if os.path.isdir(item_path):
subdir_iid = tree.insert(parent=parent_iid, index='end', text=f'[F] {name}')
try:
subdir_entries = os.listdir(item_path)
new_folder(parent_path=item_path, directory_entries=subdir_entries, parent_iid=subdir_iid)
count_folders += 1 # for testing
except PermissionError:
pass
else:
tree.insert(parent=parent_iid, index='end', text=f'[f] {name}')
count_files += 1 # for testing
# The iid of the tree item is returned as hex value
iid += 1
hex_iid = hex(iid)
hex_of_folder_file = str(hex_iid)[2:].upper() # Omit the 0x of the hex value
hex_compare = hex_of_folder_file
# For the external app searching function we need to prefix the given iid hex value with an 'I'
if len(hex_compare) >= 3:
hex_compare = 'I' + str(hex_of_folder_file)
elif len(hex_compare) == 2:
hex_compare = 'I0' + str(hex_of_folder_file)
elif len(hex_compare) == 1:
hex_compare = 'I00' + str(hex_of_folder_file)
iid = int(hex_iid, 16) # Convert back to decimal to continue the iid increment count
compare_check.update({hex_compare: [parent_path, parent_path[14:], name]}) # Update dictionary with current item
def new_folder_scandir(parent_path, parent_iid):
global iid, count_folders, count_files
for name in os.scandir(parent_path):
if name.is_dir():
subdir_iid = tree.insert(parent=parent_iid, index='end', text=f'[F] {name}')
try:
new_folder_scandir(parent_path=name.path, parent_iid=subdir_iid)
count_folders += 1 # for testing
except PermissionError:
pass
else:
tree.insert(parent=parent_iid, index='end', text=f'[f] {name}')
count_files += 1 # for testing
# The iid of the tree item is returned as hex value
iid += 1
hex_iid = hex(iid)
hex_of_folder_file = str(hex_iid)[2:].upper() # Omit the 0x of the hex value
hex_compare = hex_of_folder_file
# For the external app searching function we need to prefix the given iid hex value with an 'I'
if len(hex_compare) >= 3:
hex_compare = 'I' + str(hex_of_folder_file)
elif len(hex_compare) == 2:
hex_compare = 'I0' + str(hex_of_folder_file)
elif len(hex_compare) == 1:
hex_compare = 'I00' + str(hex_of_folder_file)
iid = int(hex_iid, 16) # Convert back to decimal to continue the iid increment count
compare_check.update({hex_compare: [parent_path, parent_path[14:], name]}) # Update dictionary with current item
def new_folder_walk(path):
global count_folders, count_files
def hex_thing(parent_path, name):
global iid
# The iid of the tree item is returned as hex value
iid += 1
hex_iid = hex(iid)
hex_of_folder_file = str(hex_iid)[2:].upper() # Omit the 0x of the hex value
hex_compare = hex_of_folder_file
# For the external app searching function we need to prefix the given iid hex value with an 'I'
if len(hex_compare) >= 3:
hex_compare = 'I' + str(hex_of_folder_file)
elif len(hex_compare) == 2:
hex_compare = 'I0' + str(hex_of_folder_file)
elif len(hex_compare) == 1:
hex_compare = 'I00' + str(hex_of_folder_file)
iid = int(hex_iid, 16) # Convert back to decimal to continue the iid increment count
compare_check.update({hex_compare: [parent_path, parent_path[14:], name]}) # Update dictionary with current item
tree_items = {path: tree.insert(parent='', index='0', text='All Documents', open=True)}
for root, dirs, files in scandir.walk(path):
for dir in dirs:
path = os.path.join(root, dir)
count_folders += 1
tree_items[path] = tree.insert(parent=tree_items[root], index='end', text=f'[F] {dir}')
hex_thing(root, dir)
for file in files:
path = os.path.join(root, file)
count_files += 1
tree.insert(parent=tree_items[root], index='end', text=f'[f] {file}')
hex_thing(root, file)
start_path = os.path.expanduser(r"C:/Program Files") # Path for test
# 0 = original, 1 = scandir, 2 = walk
run = 1
if run == 0:
parent_iid = tree.insert(parent='', index='0', text='All Documents', open=True)
start_dir_entries = os.listdir(start_path)
new_folder(parent_path=start_path, directory_entries=start_dir_entries, parent_iid=parent_iid)
elif run == 1:
parent_iid = tree.insert(parent='', index='0', text='All Documents', open=True)
new_folder_scandir(parent_path=start_path, parent_iid=parent_iid)
elif run == 2:
new_folder_walk(start_path)
time_end = time.time()
time_total = round(time_end - time_start, 3) # for testing. Simple start to end timer result
ttk.Label(root, text=f"Files: {count_files} || Folders: {count_folders} || Time: {time_total}s", font='arial 10 bold').place(x=300, y=0) # for testing
print(f"{count_files} files in {count_folders} folders processed in {time_total}s") # for testing
root.mainloop()为了记录在案,我实际上感到惊讶的是,即使在迭代每个文件时,os.walk也比os.scandir慢。
https://stackoverflow.com/questions/73204781
复制相似问题