文章/答案/技术大牛

发布

社区首页 >问答首页 >Python -多处理多文件夹

问Python -多处理多文件夹
EN

Stack Overflow用户

提问于 2021-09-21 14:03:06

回答 2查看 649关注 0票数 0

作为编程新手，我仍然在探索多线程和多线程的概念。

我编写了一个小脚本，它读取文件并将文件复制到多个临时文件夹，并对每个文件夹执行以下操作。

label.

Generate a

将其推送给Nexus.

有大约500个文件夹&按顺序处理。如何在这里使用多处理，从而一次并行处理100个文件夹或增加数量。此外，是否有可能跟踪这些过程，并使构建失败，即使一个子进程失败。

我读过多篇关于多重处理的文章，但我无法理解它：

任何指导都会对我有很大帮助，谢谢。

folder1
   -- war file
   -- metadata

folder 2
   -- war file
   -- metadata
....
....

folder 500
   -- war file
   -- metadata

代码段

import re, shutil, os
from pathlib import Path

target = "/home/work"
file_path = target + "/file.txt"

dict = {}
count = 1

def commands_to_run_on_each_folder(filetype, tmp_folder):
    target_folder = tmp_folder+'/tmp'+str(count)

    os.system(<1st command to build the label>)
    os.system(<2nd command to build the package>)
    <multiple file manipulations, where `filetype` is used and get the required file with right extension>
    <curl command to upload it to the Nexus>

#Read the text file and assemble it in a dictionary.
with open(file_path, 'r') as f:
    lines = f.read().splitlines()
    for i, line in enumerate(lines):
        match = re.match(r".*.war", line)
        if match:
            j = i-1 if i > 1 else 0
            for k in range(j, i):
                dict[match.string] = lines[k]
#Iterate the dictionary and copy the folder to the temporary folders.
for key, value in dict.items():
    os.mkdir(target+'/tmp'+str(count))
    shutil.copy(key, target+'/tmp'+str(count))
    shutil.copy(value, target+'/tmp'+str(count))
    commands_to_run_on_each_folder("war", target)
    count += 1

OS :Ubuntu18.04内存: 22 GB容器

python

python-3.x

multiprocessing

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-09-21 21:18:01

使用concurrent.futures很容易。我已经修改了你的脚本，使之成为：

#!/usr/bin/env python3
import itertools
import concurrent.futures
import logging
import pathlib
import re
import shutil


logging.basicConfig(
    level=logging.DEBUG,
    format="%(levelname)s:%(processName)s:%(message)s"
)


def worker(path1, path2, src, target, logger):
    logger.debug("Create dir %s", target)
    target.mkdir(exist_ok=True)

    logger.debug("Copy files")
    shutil.copy(src / path1, target / path1)
    shutil.copy(src / path2, target / path2)

    logger.debug("Additional commands to run on %s", target)
    # TODO: Add actions here
    # commands_to_run_on_each_folder(...)


def main():
    #Read the text file and assemble it in a dictionary.
    tasks = {}
    with open("file.txt", 'r') as f:
        lines = f.read().splitlines()
        for i, line in enumerate(lines):
            match = re.match(r".*.war", line)
            if match:
                j = i-1 if i > 1 else 0
                for k in range(j, i):
                    tasks[match.string] = lines[k]

    logger = logging.getLogger()
    # src: The directory where this script is
    src = pathlib.Path(__file__).parent
    with concurrent.futures.ProcessPoolExecutor() as executor:
        for taskid, (path2, path1) in enumerate(tasks.items(), 1):
            target = pathlib.Path(f"/tmp/dir{taskid}")

            # Calls `worker` function with parameters path1, path2, ...
            # concurrently
            executor.submit(worker, path1, path2, src, target, logger)


if __name__ == "__main__":
    main()

下面是一个示例输出：

DEBUG:ForkProcess-1:Create dir /tmp/dir1
DEBUG:ForkProcess-1:Copy files
DEBUG:ForkProcess-1:Additional commands to run on /tmp/dir1
DEBUG:ForkProcess-1:Create dir /tmp/dir2
DEBUG:ForkProcess-1:Copy files
DEBUG:ForkProcess-1:Additional commands to run on /tmp/dir2
DEBUG:ForkProcess-1:Create dir /tmp/dir3
DEBUG:ForkProcess-1:Copy files
DEBUG:ForkProcess-1:Additional commands to run on /tmp/dir3
DEBUG:ForkProcess-1:Create dir /tmp/dir4
DEBUG:ForkProcess-1:Copy files
DEBUG:ForkProcess-1:Additional commands to run on /tmp/dir4

备注

logging.WARN

I os.path

Note: I使用logging而不是print，因为logging在多进程环境

中更好地关闭日志记录，将级别改为使用路径库，因为它比print更方便，submit调用不会等待。这意味着如果函数worker需要很长时间才能运行，submit将立即返回。
使用with构造，执行器将等待所有并发任务在退出之前完成。这就是你想要的。

票数 1

Stack Overflow用户

发布于 2021-09-21 14:06:59

对于多处理来说，这不是一个好的目标，但是对于gnu parallel，它是一个很好的目标。

构建是在后台进行的: python只是在调用系统命令。当然，您可以从python并行地进行多个背景os.system调用，但是这个脚本最好作为一个find | parallel范例运行。

我要做的是重写脚本，只处理一个文件夹。那我就会做：

find /path/to/root/folder -type d | parallel --bar -I{} python3 script.py {} \;

由于您使用的是ubuntu，您已经拥有了find和parallel。请注意，这是在shell中运行的bash，而不是python。

反对在python中这样做的理由

--jobs N

your

不需要重新设计。

很容易定制:您可以通过添加代码来改变进程的数量，只需调用其他进程:您使用python就像使用bash这样的脚本语言(这很好)，因此将其视为每个文件夹
的构建脚本更有意义--您可以免费获得一个进度条和其他东西！

另一方面，如果您想在python中这样做，这是可能的。

请注意，当前的智慧建议使用subprocess而不是os.system。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/69270271

复制

相似问题

问Python -多处理多文件夹
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python -多处理多文件夹EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python -多处理多文件夹
EN