首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >创建100000+项的树视图时代码慢

创建100000+项的树视图时代码慢
EN

Stack Overflow用户
提问于 2022-08-02 09:03:51
回答 1查看 104关注 0票数 2

下面是几个月前我一直在编写的代码片段,但现在才需要它。我相信它的主要部分是一些代码,我从一个所以的帖子,但我失去了URL。不管怎样,我已经忘记了,当成千上万的文件涉及到时,它是多么缓慢,所以我正在研究使它更快的方法。

我已经尝试过移动代码的一部分,并且删除了某些部分,但是性能要么保持不变,要么变得更糟,这使我相信问题出在os.listdir命令中。据我所读,os.listdir在这里是最快的选择,因为它执行的系统调用不像扫描或步行那样多,但是它的性能仍然很糟糕,文件夹超过100000个文件,如下所示。

代码语言:javascript
复制
14387 files in 2794 folders processed in 5.88s
14387 files in 2794 folders processed in 3.224s
14387 files in 2794 folders processed in 5.847s


110016 files in 21440 folders processed in 22.732s
110016 files in 21440 folders processed in 22.603s
110016 files in 21440 folders processed in 41.055s


249714 files in 35707 folders processed in 66.452s
249714 files in 35707 folders processed in 49.154s
249714 files in 35707 folders processed in 88.43s
249714 files in 35707 folders processed in 48.942s

我目前正在研究另一种方法,使用静态文本文件索引文件/文件夹位置,该文件每小时在服务器上预先填充最新的文件夹内容,但在放弃下面的代码之前,我想请求帮助,看看是否可以使代码制作得更快,或者代码是否处于极限状态。

代码语言:javascript
复制
import tkinter as tk
import tkinter.ttk as ttk
from ttkwidgets import CheckboxTreeview
import os
import time

time_start = time.time()

iid = 1  # IID of tree item. 0 is top level parent
count_folders = 0  # Number of folders in parent
count_files = 0  # Number of files in parent
compare_check = {}  # Build the dictionary with IID key and folder/file paths in list

root = tk.Tk()
root.geometry('850x450')

style = ttk.Style(root)

v_scrollbar = tk.Scrollbar(root, orient='vertical')
v_scrollbar.place(x=830, y=20, width=20, height=415)
tree = CheckboxTreeview(root, show='tree', yscrollcommand=v_scrollbar.set)
tree.place(x=10, y=20, anchor="nw", width=815, height=415)
v_scrollbar.config(command=tree.yview)
style.configure('Treeview', indent=15)


def new_folder(parent_path, directory_entries, parent_iid):
    global iid, count_folders, count_files
    for name in directory_entries:
        item_path = parent_path + os.sep + name
        if os.path.isdir(item_path):
            subdir_iid = tree.insert(parent=parent_iid, index='end', text=f'[F] {name}')
            try:
                subdir_entries = os.listdir(item_path)
                new_folder(parent_path=item_path, directory_entries=subdir_entries, parent_iid=subdir_iid)
                count_folders += 1  # for testing
            except PermissionError:
                pass
        else:
            tree.insert(parent=parent_iid, index='end', text=f'[f] {name}')
            count_files += 1  # for testing

        # The iid of the tree item is returned as hex value
        iid += 1
        hex_iid = hex(iid)
        hex_of_folder_file = str(hex_iid)[2:].upper()  # Omit the 0x of the hex value
        hex_compare = hex_of_folder_file

        # For the external app searching function we need to prefix the given iid hex value with an 'I'
        if len(hex_compare) >= 3:
            hex_compare = 'I' + str(hex_of_folder_file)
        elif len(hex_compare) == 2:
            hex_compare = 'I0' + str(hex_of_folder_file)
        elif len(hex_compare) == 1:
            hex_compare = 'I00' + str(hex_of_folder_file)

        iid = int(hex_iid, 16)  # Convert back to decimal to continue the iid increment count

        compare_check.update({hex_compare: [parent_path, parent_path[14:], name]})  # Update dictionary with current item


parent_iid = tree.insert(parent='', index='0', text='All Documents', open=True)
start_path = os.path.expanduser(r"K:/DMC Processed - 02072017")  # Path for test
start_dir_entries = os.listdir(start_path)
new_folder(parent_path=start_path, directory_entries=start_dir_entries, parent_iid=parent_iid)

time_end = time.time()
time_total = round(time_end - time_start, 3)  # for testing. Simple start to end timer result

ttk.Label(root, text=f"Files: {count_files} || Folders: {count_folders} || Time: {time_total}s", font='arial 10 bold').place(x=300, y=0)  # for testing

print(f"{count_files} files in {count_folders} folders processed in {time_total}s")  # for testing

root.mainloop()
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-08-02 13:12:17

由于你很好地设置了它的时间,我认为这将是一个有趣的挑战,尝试一下。

我试图重写它以使用os.walk,但是我认为您的os.path.isdir()调用会非常慢,所以我用scandir换掉了它。结果发现这是我能找到的最快的方法。

基准:

代码语言:javascript
复制
original: 697665 files in 76729 folders processed in 106.079s
os.scandir: 697665 files in 76729 folders processed in 23.152s
os.walk: 697665 files in 76731 folders processed in 32.869s

使用scandir模块似乎没有什么不同,现在似乎已经很好地优化了os

下面是包含其他函数的代码:

代码语言:javascript
复制
import tkinter as tk
import tkinter.ttk as ttk
from ttkwidgets import CheckboxTreeview
import os
import scandir
import time

time_start = time.time()

iid = 1  # IID of tree item. 0 is top level parent
count_folders = 0  # Number of folders in parent
count_files = 0  # Number of files in parent
compare_check = {}  # Build the dictionary with IID key and folder/file paths in list

root = tk.Tk()
root.geometry('850x450')

style = ttk.Style(root)

v_scrollbar = tk.Scrollbar(root, orient='vertical')
v_scrollbar.place(x=830, y=20, width=20, height=415)
tree = CheckboxTreeview(root, show='tree', yscrollcommand=v_scrollbar.set)
tree.place(x=10, y=20, anchor="nw", width=815, height=415)
v_scrollbar.config(command=tree.yview)
style.configure('Treeview', indent=15)


def new_folder(parent_path, directory_entries, parent_iid):
    global iid, count_folders, count_files
    for name in directory_entries:
        item_path = parent_path + os.sep + name
        if os.path.isdir(item_path):
            subdir_iid = tree.insert(parent=parent_iid, index='end', text=f'[F] {name}')
            try:
                subdir_entries = os.listdir(item_path)
                new_folder(parent_path=item_path, directory_entries=subdir_entries, parent_iid=subdir_iid)
                count_folders += 1  # for testing
            except PermissionError:
                pass
        else:
            tree.insert(parent=parent_iid, index='end', text=f'[f] {name}')
            count_files += 1  # for testing

        # The iid of the tree item is returned as hex value
        iid += 1
        hex_iid = hex(iid)
        hex_of_folder_file = str(hex_iid)[2:].upper()  # Omit the 0x of the hex value
        hex_compare = hex_of_folder_file

        # For the external app searching function we need to prefix the given iid hex value with an 'I'
        if len(hex_compare) >= 3:
            hex_compare = 'I' + str(hex_of_folder_file)
        elif len(hex_compare) == 2:
            hex_compare = 'I0' + str(hex_of_folder_file)
        elif len(hex_compare) == 1:
            hex_compare = 'I00' + str(hex_of_folder_file)

        iid = int(hex_iid, 16)  # Convert back to decimal to continue the iid increment count

        compare_check.update({hex_compare: [parent_path, parent_path[14:], name]})  # Update dictionary with current item


def new_folder_scandir(parent_path, parent_iid):
    global iid, count_folders, count_files
    for name in os.scandir(parent_path):
        if name.is_dir():
            subdir_iid = tree.insert(parent=parent_iid, index='end', text=f'[F] {name}')
            try:
                new_folder_scandir(parent_path=name.path, parent_iid=subdir_iid)
                count_folders += 1  # for testing
            except PermissionError:
                pass
        else:
            tree.insert(parent=parent_iid, index='end', text=f'[f] {name}')
            count_files += 1  # for testing

        # The iid of the tree item is returned as hex value
        iid += 1
        hex_iid = hex(iid)
        hex_of_folder_file = str(hex_iid)[2:].upper()  # Omit the 0x of the hex value
        hex_compare = hex_of_folder_file

        # For the external app searching function we need to prefix the given iid hex value with an 'I'
        if len(hex_compare) >= 3:
            hex_compare = 'I' + str(hex_of_folder_file)
        elif len(hex_compare) == 2:
            hex_compare = 'I0' + str(hex_of_folder_file)
        elif len(hex_compare) == 1:
            hex_compare = 'I00' + str(hex_of_folder_file)

        iid = int(hex_iid, 16)  # Convert back to decimal to continue the iid increment count

        compare_check.update({hex_compare: [parent_path, parent_path[14:], name]})  # Update dictionary with current item



def new_folder_walk(path):
    global count_folders, count_files

    def hex_thing(parent_path, name):
        global iid

        # The iid of the tree item is returned as hex value
        iid += 1
        hex_iid = hex(iid)
        hex_of_folder_file = str(hex_iid)[2:].upper()  # Omit the 0x of the hex value
        hex_compare = hex_of_folder_file

        # For the external app searching function we need to prefix the given iid hex value with an 'I'
        if len(hex_compare) >= 3:
            hex_compare = 'I' + str(hex_of_folder_file)
        elif len(hex_compare) == 2:
            hex_compare = 'I0' + str(hex_of_folder_file)
        elif len(hex_compare) == 1:
            hex_compare = 'I00' + str(hex_of_folder_file)

        iid = int(hex_iid, 16)  # Convert back to decimal to continue the iid increment count

        compare_check.update({hex_compare: [parent_path, parent_path[14:], name]})  # Update dictionary with current item

    tree_items = {path: tree.insert(parent='', index='0', text='All Documents', open=True)}
    for root, dirs, files in scandir.walk(path):
        for dir in dirs:
            path = os.path.join(root, dir)
            count_folders += 1
            tree_items[path] = tree.insert(parent=tree_items[root], index='end', text=f'[F] {dir}')
            hex_thing(root, dir)

        for file in files:
            path = os.path.join(root, file)
            count_files += 1
            tree.insert(parent=tree_items[root], index='end', text=f'[f] {file}')
            hex_thing(root, file)


start_path = os.path.expanduser(r"C:/Program Files")  # Path for test

# 0 = original, 1 = scandir, 2 = walk
run = 1

if run == 0:
    parent_iid = tree.insert(parent='', index='0', text='All Documents', open=True)
    start_dir_entries = os.listdir(start_path)
    new_folder(parent_path=start_path, directory_entries=start_dir_entries, parent_iid=parent_iid)
elif run == 1:
    parent_iid = tree.insert(parent='', index='0', text='All Documents', open=True)
    new_folder_scandir(parent_path=start_path, parent_iid=parent_iid)
elif run == 2:
    new_folder_walk(start_path)

time_end = time.time()
time_total = round(time_end - time_start, 3)  # for testing. Simple start to end timer result

ttk.Label(root, text=f"Files: {count_files} || Folders: {count_folders} || Time: {time_total}s", font='arial 10 bold').place(x=300, y=0)  # for testing

print(f"{count_files} files in {count_folders} folders processed in {time_total}s")  # for testing

root.mainloop()

为了记录在案,我实际上感到惊讶的是,即使在迭代每个文件时,os.walk也比os.scandir慢。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73204781

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档