文章/答案/技术大牛

发布

社区首页 >问答首页 >Linux服务器/NAS的备份脚本

问Linux服务器/NAS的备份脚本
EN

Code Review用户

提问于 2019-12-09 13:47:27

回答 1查看 117关注 0票数 6

这是我的剧本。我希望它每周在我的linux服务器上运行，运行在Raspberry Pi 4上，备份所有可能已经更改的文件。在GPIO上只有LED。它需要一个名为notify_run的程序，以及它自己目录中的一个名为"BackupSettings.ini“的文件，如下所示：

[Sources]
Folder1=/home/pi/Desktop/Scripts/test/BackupTestEnvironment/Original Drive

[Destinations]
Folder1=/home/pi/Desktop/Scripts/test/BackupTestEnvironment/Backup Drive

...and可以有多对文件夹，我想听听一些改进的建议，因为我是一个血腥的初学者，所以请耐心对待我:)

下面是主要代码：

"""Import"""
import RPi.GPIO as GPIO
import shutil, os, hashlib, json, subprocess
from configparser import ConfigParser
from datetime import date
from operator import itemgetter


"""Init"""
config = ConfigParser()
config.read('BackupSettings.ini')
Sources = dict(config.items('Sources'))
Destinations = dict(config.items('Destinations'))
Indexnew = []
Indexold = {}
DeletedFiles = []
today = date.today().strftime("%Y_%m_%d")
GPIO.setmode(GPIO.BCM)
GPIO.setup(20, GPIO.OUT)
GPIO.setup(21, GPIO.OUT)



"""Def"""
def CreateNewIndex():
    global Indexnew
    for path, dirs, files in os.walk(Source):
        for file in files:
            filepath = path+"/"+file
            sha512_hash = hashlib.sha512()
            with open(filepath,"rb") as f:
                for byte_block in iter(lambda: f.read(4096),b""):
                    sha512_hash.update(byte_block)
                hashsum = sha512_hash.hexdigest()
            x, filepath = path.split(Source, 1)
            filepath = filepath+"/"
            data = {'Name': file, 'Path': filepath, 'Hashsum': hashsum}
            Indexnew.append(data)
    with open(Destination+"/"+today+".json", 'w+') as jsonout:
        json.dump(Indexnew,jsonout)

def ImportOldIndexes():
    global Indexold
    files = [f for f in os.listdir(Destination) if 
os.path.isfile(os.path.join(Destination,f))]
    if today+".json" in files:
        files.remove(today+".json")
    files.sort()
    for file in files:
        filepath = Destination+"/"+file
        Indexold[file] = json.load(open(filepath, "r"))

def Compare():
    keys = list(Indexold.keys())
    keys.sort()
    global Indexnew
    global DeletedFiles
    if Indexold:
        for x in Indexold[keys[-1]]:
            counter = 0
            for y in Indexnew:
                if itemgetter('Name', 'Path', 'Hashsum')(x) == itemgetter('Name', 'Path', 'Hashsum')(y):
                    y['Change'] = 'unchanged'
                elif itemgetter('Name', 'Hashsum')(x) == itemgetter('Name', 'Hashsum')(y):
                    y['Change'] = 'moved'
                elif itemgetter('Path', 'Hashsum')(x) == itemgetter('Path', 'Hashsum')(y):
                    y['Change'] = 'renamed'
                elif itemgetter('Name', 'Path')(x) == itemgetter('Name', 'Path')(y):
                    y['Change'] = 'newversion'
                else:
                    counter = counter + 1
                    if counter == len(Indexnew):
                        DeletedFiles.append(x)
        with open(Destination+"/DeletedFiles/"+today+".json", 'w+') as jsonout:
            json.dump(DeletedFiles,jsonout)
        for x in Indexnew:
            counter = 0
            for y in Indexold[keys[-1]]:
                if not itemgetter('Name', 'Path', 'Hashsum')(x) == itemgetter('Name', 'Path', 'Hashsum')(y):
                    if not itemgetter('Name', 'Hashsum')(x) == itemgetter('Name', 'Hashsum')(y):
                        if not itemgetter('Path', 'Hashsum')(x) == itemgetter('Path', 'Hashsum')(y):
                            if not itemgetter('Name', 'Path')(x) == itemgetter('Name', 'Path')(y):
                                counter = counter + 1
                                if counter == len(Indexold[keys[-1]]):
                                    x['Change'] = 'new'
        with open(Destination+"/"+today+".json", 'w+') as jsonout:
            json.dump(Indexnew,jsonout)

def Execute():
    error = 0
    for x in Indexnew:
        if x['Change'] == 'new' or x['Change'] == 'moved' or x['Change'] == 'renamed' or x['Change'] == 'newversion':
            Copyfrom = Source+x['Path']+x['Name']
            Copyto = Destination+"/"+today+x['Path']
            if not os.path.exists(Copyto):
                os.makedirs(Copyto)
            shutil.copy(Copyfrom, Copyto)
            sha512_hash = hashlib.sha512()
            with open(Copyto+x['Name'],"rb") as f:
                for byte_block in iter(lambda: f.read(4096),b""):
                    sha512_hash.update(byte_block)
                hashsum = sha512_hash.hexdigest()
            if not hashsum == x['Hashsum']:
                error = error + 1
                print("Error")
    if error == 0:
        print("Success")
    else:
        print("Error")
        GPIO.output(21, True)
        notify = subprocess.Popen(["notify-run", "send", '"Error during Backup"'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)


"""Run"""
count = 0
for amount in Sources.values():
    count = count + 1
    Source = Sources["folder"+str(count)]
    Destination = Destinations["folder"+str(count)]
    GPIO.output(20, True)
    CreateNewIndex()
    ImportOldIndexes()
    Compare()
    Execute()
    GPIO.output(21, False)

我真希望在把它粘在这里之后，不要制造任何错误。

编辑:我忘记提到它是由crontab调用的。

编辑:它应该运行在关键数据上，并且应该尽可能地进行故障保护，这就是为什么我使用sha512而不是md5。此外，它应该能够处理任何类型的文件，它被击中。如果你对其他安全机制有任何想法，请告诉我。输入将包含约2 TB的文件，从1kb到200 TB。我是唯一使用它的人。

python

python-3.x

linux

回答 1

Code Review用户

回答已采纳

发布于 2019-12-09 20:46:32

风格

Python有一个“官方”的Python代码样式指南，大多数程序员倾向于遵循它，尽管它最初只是严格地为标准库编写的。值得一读。

使代码更符合样式指南的最简单的第一步是将函数和变量名称更改为通常的lowercase_with_underscores。

常量值，例如为了这个程序而命名的today，通常是用ALL_UPPERCASE_WITH_UNDERSCORES命名的。

幸运的是，有一个随时可用的各种工具可以帮助您检查和(自动)修复一些/大部分这些问题。

全局变量

全局变量通常最好避免，因为它们使得很难(呃)跟踪程序状态的哪个部分被更改。为了摆脱它们，您必须重写您的函数以接受相关的输入作为参数，并实际返回将要使用的(修改的)值。举个例子：

def create_new_index(source):
    index_new = []
    # _ is commonly used for "don't care" values
    for path, _, files in os.walk(source):
        ...

    return index_new

我选择只将source作为参数传递给函数，并更改了它的名称，因为我还建议不要在该函数中写入数据。这不是严格必要的，但有助于使您的函数易于管理，因为它们只有一个有限责任。

循环类似于本机

而不是

count =0表示Sources.values()中的数量: count = count +1 Source =Source目的地=目的地

在从未使用过amount的地方，您可以像这样使用enumerate(...)

for count, _ in enumerate(Sources.values(), 1):
    source = Sources[f"folder" + str(count)]
    destination = Destinations["folder" + str(count)]
    new_index = create_new_index(source)

或者，如果您不关心文件夹的特定顺序，只需使用

for key, source in Sources.items():
    destination = Destinations[key]
    new_index = create_new_index(source)

另外，最后一个版本还允许您摆脱配置文件中的键/文件夹名称必须遵循严格的FolderX模式的键。

`itemgetter`与匹配

itemgetters可以重用。

get_nph = itemgetter('Name', 'Path', 'Hashsum')
get_nh = itemgetter('Name', 'Hashsum')
...
if get_nph(x) == get_nph(y):
    # ... do something
elif get_nh(x) == get_nh(y):
    # ... do something
# and so on

为您使用的每个参数组合定义一个可重用的itemgetter函数是代码最直接的转换，如您的问题所示。在您的原始代码中，您在需要时声明了一个新的itemgetter函数。正如上面的例子所示，这是不必要的。

但是代码可以完全不使用itemgetters，因为您所做的就是比较文件的三个属性是否相等，然后采取相应的行动。采用相同方法的替代实现可能如下所示：

names_match = x['Name'] == y['Name']
paths_match = x['Path'] == y['Path']
hashsums_match = x['Hashsum'] == y['Hashsum']
if names_match and paths_match and hashsums_match:
    y['Change'] = 'unchanged'
elif names_match and hashsums_match:
    y['Change'] = 'moved'
elif paths_match and hashsums_match:
    y['Change'] = 'renamed'
elif names_match and paths_match:
    y['Change'] = 'newversion'
else:
    # ...

我倾向于说这是更易读的。但这可能是品味的问题。

处理路径

与手动连接路径(如Destination + "/" + TODAY + x['Path'] )不同，您可以使用os.path.join(...) (如os.path.join(destination, TODAY, x['Path']) )。此函数的另一个优点是，它注意使用“正确”的OS特定分隔符(即Windows上的\，Linux上的/ )，尽管这在这里并不是绝对必要的，因为目标仅仅是Linux。

Python3还提供了PathLib模块，它使使用路径及其部分更加方便。如果你打算重做你的脚本或者将来的项目，也许你可以看看它。

运行脚本

您已经用块注释"""Run"""标记了脚本中应该在执行时运行的部分。这可能适用于查看您的代码的人，但是解释器不太关心它。如果您想要重用脚本中的函数，那么就会触发备份例程。

相反，应该使用if __name__ == "__main__": (也)告诉解释器文件的哪些部分应该作为脚本运行。还有一个在堆栈溢出处解释得很好。

if __name__ == "__main__":
    config = ConfigParser()
    config.read('BackupSettings.ini')
    sources = dict(config.items('Sources'))
    destinations = dict(config.items('Destinations'))
    for key, source in sources.items():
        destination = destinations[key]
        new_index = create_new_index(source)
        ...

票数 3

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/233689

复制

相似问题

问Linux服务器/NAS的备份脚本
EN

回答 1

Code Review用户

风格

全局变量

循环类似于本机

`itemgetter`与匹配

处理路径

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Linux服务器/NAS的备份脚本EN

回答 1

Code Review用户

风格

全局变量

循环类似于本机

itemgetter与匹配

处理路径

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Linux服务器/NAS的备份脚本
EN

`itemgetter`与匹配