首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python Repo清理

Python Repo清理
EN

Stack Overflow用户
提问于 2019-05-29 20:35:15
回答 2查看 435关注 0票数 0

我有一个脚本,我几乎100%完成,但只有一个步骤,我不能搞清楚。我的脚本当前检查目标文件是否已经存在,如果存在,则不会移动源位置中的文件。我遇到的问题是,代码不会检查所有子目录,也不会只检查根目录。

我正在使用os.walk遍历源文件夹中的所有文件,但不确定如何os.walk目标文件夹和源文件夹。

代码语言:javascript
复制
import time
import sys
import logging
import logging.config


def main():
    purge_files

def move_files(src_file):

    try:
        #Attempt to move files to dest
        shutil.move(src_file, dest)
        #making use of the OSError exception instead of FileExistsError due to older version of python not contaning that exception 
    except OSError as e:
        #Log the files that have not been moved to the console
        logging.info(f'Files File already exists: {src_file}')
        print(f'File already exists: {src_file}')
        #os.remove to delete files that are already in dest repo
        os.remove(src_file)
        logging.warning(f'Deleting: {src_file}')

def file_loop(files, root):

    for file in files:
        #src_file is used to get the full path of everyfile
        src_file = os.path.join(root,file)

        #The two variables below are used to get the files creation date
        t = os.stat(src_file)
        c = t.st_ctime
        #If the file is older then cutoff code within the if statement executes

        if c<cutoff:

            move_files(src_file)
        #Log the file names that are not older then the cutoff and continue loop
        else:
            logging.info(f'File is not older than 14 days: {src_file}')
            continue

def purge_files():

    logging.info('invoke purge_files method')
    #Walk through root directory and all subdirectories
       for root, subdirs, files in os.walk(source):
          dst_dir = root.replace(source, dest)

           #Loop through files to grab every file
           file_loop(files, root)

       return files, root, subdirs


files, root, subdirs = purge_files()

我预期输出会将源文件中的所有文件移动到dest。在移动文件之前,我希望检查dest位置中的所有文件,包括destsubdir,如果其中任何文件与源文件相同,则不会将它们移动到dest。我不想要源代码中的文件夹。我只想把所有的文件移到根目录下。

EN

回答 2

Stack Overflow用户

发布于 2019-05-30 22:20:10

我可以看到你已经写了很大一部分代码,但由于它是目前发布的,它包含了相当多的错误:

  • 代码缩进不正确,导致代码无效。
  • 某些导入语句丢失(例如,for shutil).
  • You引用了未定义的变量(例如,source).

如果我将您的代码复制粘贴到我的集成开发环境中,我会从pep8pylint中得到26个错误,在修复缩进错误之后,我会得到49个错误。这让我想知道这是你的实际代码,还是你犯了复制-粘贴错误。无论如何,使用IDE肯定会帮助您验证代码并更早地捕获错误。试试看!

因为我不能运行你的代码,所以我不能确切地说为什么它不能工作,但我可以给你一些指点。

有一件事引起了很多问题,那就是下面这行:

代码语言:javascript
复制
dst_dir = root.replace(source, dest)

除了不好的缩进之外,变量dst_dir在任何地方都不能使用。那么这句话的意义是什么呢?还要注意的是,这将替换root中出现的所有source。对于微不足道的情况,这不是问题,但它并不是在所有情况下都很健壮。因此,请尽可能使用标准库中的路径操作,并尽量避免在路径上执行手动字符串操作。在Python3.4中引入了Pathlib模块。我推荐使用它。

在某些情况下,使用os.walk()非常方便,但对于您的用例来说,可能不是最好的解决方案。也许递归地使用os.listdir()会容易得多,特别是因为目标目录将是平面的(即没有子目录的固定目录)。

一个可能的实现(使用pathlibos.listdir())可能如下所示:

代码语言:javascript
复制
import logging
import os
import pathlib
import shutil
import time

SOURCE_DIR_PATH = pathlib.Path('C:\\Temp')
DESTINATION_DIR_PATH = pathlib.Path('D:\\archive')

CUTOFF_DAYS = 14
CUTOFF_TIME = time.time() - CUTOFF_DAYS * 24 * 3600  # two weeks


def move_file(src_file_path, dst_dir_path):
    logging.debug('Moving file %s to directory %s', src_file_path,
                  dst_dir_path)
    return  # REMOVE THIS LINE TO ACTUALLY PERFORM FILE OPERATIONS
    try:
        shutil.move(str(src_file_path), str(dst_dir_path))
    except OSError:
        logging.info('File already exists in destination directory: %s',
                     src_file_path)
        logging.warning('Deleting file %s', src_file_path)
        src_file_path.unlink()


def move_files(src_file_paths, dst_dir_path):
    for src_file_path in src_file_paths:
        if src_file_path.stat().st_ctime < CUTOFF_TIME:
            logging.info('Moving file older than %d days: %s', CUTOFF_DAYS,
                         src_file_path)
            move_file(src_file_path, dst_dir_path)
        else:
            logging.info('Not moving file less than %d days old: %s',
                         CUTOFF_DAYS, src_file_path)


def purge_files(src_dir_path, dst_dir_path):
    logging.info('Scanning directory %s', src_dir_path)
    names = os.listdir(src_dir_path)
    paths = [src_dir_path.joinpath(name) for name in names]
    file_paths = [path for path in paths if path.is_file()]
    dir_paths = [path for path in paths if path.is_dir()]
    # Cleanup files
    move_files(file_paths, dst_dir_path)
    # Cleanup directories, recursively.
    for dir_path in dir_paths:
        purge_files(dir_path, dst_dir_path)


def main():
    logging.basicConfig(format='%(message)s', level=logging.DEBUG)
    purge_files(SOURCE_DIR_PATH, DESTINATION_DIR_PATH)


if __name__ == '__main__':
    main()

我测试了这段代码,它起作用了。

请注意,我对move_file使用了与示例中相同的错误处理。然而,我认为它并不是很健壮。如果源目录中存在两个同名的文件(在不同的子目录中,或者在不同的时间),该怎么办?则第二个文件将被删除而不进行备份。此外,如果出现其他错误(如“磁盘已满”或“网络错误”),代码只会假定文件已经备份并删除了原始文件。我不知道你的用例,但我会认真考虑重写这个函数。

但是,我希望这些建议和示例代码能让您走上正轨。

票数 1
EN

Stack Overflow用户

发布于 2019-05-30 04:33:30

你可能想要清理你的代码,它充满了bug。例如,main中的“purge_files”而不是“purge_files()”,purge_files中的缩进错误等等。此外,代码之间看似随机的换行符也让它读起来有点笨拙(至少对我来说是这样) :)

编辑:我快速浏览了你的代码,并修改了一些东西。主要是变量名。我注意到有几个变量的名称不具描述性('i','t‘等)。以及描述该变量含义的注释。如果您只是将变量名更改为更具描述性的名称,则不需要注释,您的代码甚至更易于rad。请注意,我没有测试这段代码,tbh它甚至可能不会运行(因为这不是我的目标,而是为了显示我建议的一些样式更改) :)

代码语言:javascript
复制
import os 
import shutil
import time
import errno
import time
import sys
import logging
import logging.config


# NOTE: It is a convention to write constants in all caps
SOURCE = r'C:\Users\Desktop\BetaSource'
DEST = r'C:\Users\Desktop\BetaDest'
#Gets the current time from the time module
now = time.time()
#Timer of when to purge files
cutoff = now - (14 * 86400)
all_sources = []
all_dest_dirty = []
logging.basicConfig(level = logging.INFO,
                    filename = time.strftime("main-%Y-%m-%d.log"))


def main():
    # NOTE: Why is this function called / does it exist? It only sets a global
    # 'dest_files' which is never used...
    dest_files()
    purge_files()


# I used the dess_files function to get all of the destination files
def dest_files():
    for root, subdirs, files in os.walk(DEST):
        for file in files:
            # NOTE: Is it really necessary to use a global here?
            global all_dirty
            all_dirty.append(files)


def purge_files():
    logging.info('invoke purge_files method')
    # I removed all duplicates from dest because cleaning up duplicates in
    # dest is out of the scope
    # NOTE: This is the perfect usecase for a set
    all_dest_clean = set(all_dest_dirty)
    # os.walk used to get all files in the source location 
    for source_root, source_subdirs, source_files in os.walk(SOURCE):
        # looped through every file in source_files
        for file in source_files:
            # appending all_sources to get the application name from the
            # file path
            all_sources.append(os.path.abspath(file).split('\\')[-1]) 
            # looping through each element of all_source
            for source in all_sources:
                # logical check to see if file in the source folder exists
                # in the destination folder
                if source not in all_dest_clean:
                    # src is used to get the path of the source file this
                    # will be needed to move the file in shutil.move
                    src =  os.path.abspath(os.path.join(source_root, source))
                    # the two variables used below are to get the creation
                    # time of the files
                    metadata = os.stat(src)
                    creation_time = metadata.st_ctime
                    # logical check to see if the file is older than the cutoff
                    if creation_time < cutoff:
                        logging.info(f'File has been succesfully moved: {source}')
                        print(f'File has been succesfully moved: {source}')
                        shutil.move(src,dest)
                        # removing the already checked source files for the
                        # list this is also used in other spots within the loop
                        all_sources.remove(source)
                    else:
                        logging.info(f'File is not older than 14 days: {source}')
                        print(f'File is not older than 14 days: {source}')
                        all_sources.remove(source)
                else:
                    all_sources.remove(source)
                    logging.info(f'File: {source} already exists in the destination')
                    print(f'File: {source} already exists in the destination')


if __name__ == '__main__':
    main()
票数 -1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/56360794

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档