我们有大量的项目在工作中。非常重要的是,我们跟踪这些信息,并且经常需要运行一个与IT运行的“正常”备份无关的全面备份(每天、每周、每一个月等)。最近,我们对所有较老的项目进行了大规模迁移。为了确保在迁移(基本上是版本升级)期间没有任何东西受到不可挽回的损坏,在升级之前,所有的东西都会备份两次。
这里有网络驱动器和慢速计算机(所以一切都是IO绑定的),这使得CPU的使用在很大程度上无关紧要。
当前速度:
我们讨论的是源目录的60+ GB,分布在7k文件夹和180 k文件上。如果我们能以不同的方式读/写,使磁盘和网络更容易,那就太好了。
我正在开发一个应用程序,最终完成3件事情:
这个问题纯粹是关于这个过程的第一部分,故事的其余部分纯粹是为了解释背景和动机。它可以单独使用,只是很好,所以它已经准备好接受审查了。
这个程序将对所涉及的目录执行一些基本的健全检查,验证所需的空间是否可用,并对任意多个位置执行备份。最后,它会打印出写入的字节数,以便快速直观地检查是否有任何错误。重要的是,像创建日期和最后一次修改这样的元数据保持不变。
我已经将代码分成了一些比较合理的函数。有些人在做繁重的工作,另一些人只是为了避免重复自己。我对如何实现目录验证并不满意,报告和异常处理充其量也是低劣的,我相信其他功能可以在可重用性方面得到改进。
我尝试在目录检查中做一些花哨的事情,这样我就可以递归地确定有效性,而不管是传递字符串、WindowsPath还是字典。但失败得很。我觉得我并没有最好地使用pathlib,但考虑到我是遗留工具(我在shutil上遇到了非常奇怪的PermissionError问题,而xcopy“只是起作用了”),我不完全确定这是否能得到帮助。我认为所有的读数和至少部分验证都可以用装饰器( 就像这里 )来完成,但这对我来说还是相当神奇的。
我确信它也不应该都放在一个文件中,但是考虑到除了main之外的所有东西,我都会把它放到一个utils.py中,我还没有走上这条路。我愿意接受各种想法。老实说,我很惊讶我的代码仍然有效,它看起来不应该。我尝试过(没有显示)用包装器和装饰器来美化它,但它似乎需要重写才能奏效。只要以干净的方式复制数据,代码的其余部分仍然可以以任何方式进行更改。规格不是一成不变的。
# -*- coding: utf-8 -*-
"""
Created on Wed May 5 13:10:37 2021
@author: Mast
Notes:
# xcopy & psutil don't seem to handle pathlib.WindowsPath too well,
# converting to string conveniently turns / into \\
# shutil kept running into PermissionError where xcopy has no trouble
"""
import os
import sys
import subprocess
import psutil
import time
from datetime import datetime
from pathlib import Path
PROJECTS_SOURCE = Path('Z:/EPLAN_DATA/Gegevens/Projecten/')
BACKUP_DIRECTORIES = {
'local': Path('C:/backups/eplanprojects'),
'network': Path('N:/BackupsEplan')
}
XCOPY_ARGS = "/e /h /i /q /s /z"
MAX_BACKUPS = 3
SLEEP_TIME = 60
FORCE = True
def to_megabytes(size):
"""
Parameters
----------
size : int
Returns
-------
str
Turn Bytes into rounded MegaBytes.
"""
return "{0} MB".format(round(size / 1000000.0, 2))
def get_directory_size(directory):
"""
Parameters
----------
directory : pathlib.WindowsPath
Returns
-------
int
Size of the contents of the directory in Bytes.
"""
print("Calculating size of {0}".format(directory))
size = 0
for path, _, files in os.walk(directory):
for file in files:
file_path = os.path.join(path, file)
size += os.path.getsize(file_path)
print(to_megabytes(size))
return size
def free_space(directory):
"""
Parameters
----------
directory : pathlib.WindowsPath
Returns
-------
int
Free space available on partition the directory belongs to.
"""
return psutil.disk_usage(str(directory)).free
def terminate_program(additional_info=""):
"""
Parameters
----------
additional_info : str (OPTIONAL)
Print additional_info, wait & exit.
"""
print("Program terminated before completion.")
if additional_info:
print(additional_info)
time.sleep(SLEEP_TIME)
sys.exit()
def validate_directory(directory):
"""
Parameters
----------
directory : pathlib.WindowsPath
Raise FileNotFoundError if directory does not exist.
"""
if not directory.exists():
print("Invalid directory: {0}".format(directory))
raise FileNotFoundError
def verify_available_backup_space(source_size, backup_directories):
"""
Parameters
----------
source_size : int
backup_directories : dict of pathlib.WindowsPath
Validate backup directories exist and are big enough to hold source.
Terminate on failure.
"""
for backup_directory in backup_directories.values():
validate_directory(backup_directory)
backup_space_available = free_space(backup_directory)
if source_size > backup_space_available:
# WARNING: If multiple back-up locations are on the SAME partition,
# this check is insufficient
print("That's not going to fit.\nTarget: {0} available.\nSource: {1}".format(
backup_space_available, source_size))
terminate_program()
def backup_projects(source, backup_directories):
"""
Parameters
----------
source : pathlib.WindowsPath
backup_directories : dict of pathlib.WindowsPath
Returns
-------
list of int : bytes_written
Perform backup of source directory to (multiple) backup directories.
"""
bytes_written = []
for backup_directory in backup_directories:
if len([f.path for f in os.scandir(backup_directories.get(backup_directory)) if f.is_dir()]) >= MAX_BACKUPS:
print("Amount of immediate subdirectories in ({0}) is higher or equal to maximum amount of backups ({1}) configured.".format(
backup_directory, MAX_BACKUPS))
if not FORCE:
terminate_program()
else:
print("Backup forced. Continuing.")
print(backup_directories[backup_directory])
start_time = datetime.now()
print("Start copy {0}".format(start_time))
try:
subfolder = "_{}".format(start_time).replace(
':', '-').replace(' ', '_').split('.')[0]
print(subfolder)
syscall = "xcopy {source} {destination}\\{subfolder} {args}".format(
source=str(source),
destination=str(backup_directories[backup_directory]),
subfolder=subfolder,
args=XCOPY_ARGS)
print(syscall)
subprocess.run(syscall, check=True)
except PermissionError:
print("Permission denied: {0}".format(syscall))
terminate_program()
end_time = datetime.now()
print("Started: {0}\nFinished: {1}\nExecution time {2}".format(
start_time,
end_time,
end_time - start_time)
)
bytes_written.append(get_directory_size(
str(backup_directories[backup_directory]) + '\\' + subfolder))
for value in bytes_written:
print(to_megabytes(value))
def main():
validate_directory(PROJECTS_SOURCE)
projects_source_size = get_directory_size(PROJECTS_SOURCE)
verify_available_backup_space(projects_source_size, BACKUP_DIRECTORIES)
backup_projects(PROJECTS_SOURCE, BACKUP_DIRECTORIES)
if __name__ == "__main__":
main()
print("Press any key...")
input()实际产出:
Calculating size of Z:\EPLAN_DATA\Gegevens\Projecten
60585.67 MB
C:\backups\eplanprojects
Start copy 2021-05-07 17:21:57.007150
_2021-05-07_17-21-57
xcopy Z:\EPLAN_DATA\Gegevens\Projecten C:\backups\eplanprojects\_2021-05-07_17-21-57 /e /h /i /q /s /z
178642 File(s) copied
Started: 2021-05-07 17:21:57.007150
Finished: 2021-05-07 21:41:52.366467
Execution time 4:19:55.359317
Calculating size of C:\backups\eplanprojects\_2021-05-07_17-21-57
60585.67 MB
N:\BackupsEplan
Start copy 2021-05-07 21:43:31.600948
_2021-05-07_21-43-31
xcopy Z:\EPLAN_DATA\Gegevens\Projecten N:\BackupsEplan\_2021-05-07_21-43-31 /e /h /i /q /s /z
178642 File(s) copied
Started: 2021-05-07 21:43:31.600948
Finished: 2021-05-07 23:31:07.970629
Execution time 1:47:36.369681
Calculating size of N:\BackupsEplan\_2021-05-07_21-43-31
60585.67 MB
60585.67 MB
60585.67 MB
Press any key...任何事情和一切都是值得审查的。吹毛求疵
Python3.8.5在Windows10 x64上,这次没有对库的限制,也不需要跨平台。
发布于 2021-05-08 15:28:42
您可以使用f-字符串而不是.format()。它提高了代码的可读性,而且速度也更快(参见这里,甚至在我的系统上,f-字符串比.format()在我的机器上快4.4倍,在WSL2上比Ubuntu20.04LTS快3.5倍),尽管在大多数情况下速度差异是可以忽略不计的。
例如,在get_directory_size中,您可以编写print(f"Calculating size of {directory}")。
类似地,在to_megabytes中,您可以编写return f"{round(size / 1_000_000.0, 2)} MB"。
https://codereview.stackexchange.com/questions/260491
复制相似问题