我只想加密JSON文件中的敏感信息,假设该文件如下所示:
{
"entities": [
{
"name": "john doe",
"personalInformation" : {
"email": "john.doe@email.com",
"password": "sensitiveinformation123@"
}
}
, ...
]
}我想要做的是只加密文件的敏感字段,例如,如果我说字段email和password是敏感信息,我将只加密该字段的值,稍后我将解密所有内容。
加密的JSON应该如下所示:
{
"entities": [
{
"name": "john doe",
"personalInformation" : {
"email": "gAAAAABgXeNr95vq78gambIlZGabAZeQRptNqzXfS_qmNB5O6KBneARNpnqX6OP4Q0s4NeRtkQZRNQRizHKLB2Ydj5Uso-OngD4=",
"password": "xUFGAgXeNr95vq78gambIlZGabAZeQRptNqzXfSIlZGaBneARNpnqX6OP4Q0s4NeRtkQZRNQRizHKLB2Ydj5Uso-OngD4="
}
}
, ...
]
}我尝试过使用Fernet使用正则表达式来加密字段,但是性能不能再差了,而且当我尝试解密文件时,我需要像加密一样逐个字段解密,否则我会得到cryptography.fernet.InvalidToken错误,因为库试图解密整个文件,而且不仅有加密的数据。
我现在拥有的代码:
import json, html, copy
from pydash import get
from cryptography.fernet import Fernet
import re
def encrypt_json_fields(data: str, fields: list, encryption_key: str) -> str:
encrypted = copy.copy(data)
pii = []
fer = Fernet(encryption_key)
for f in fields:
pii += set(re.findall(r'(?<="{0}": ").*?(?=",)'.format(f),data))
for value in set(pii):
encrypted = encrypted.replace(value,(fer.encrypt(bytes(value, encoding='utf-8'))).decode('utf-8'))
return encrypted有没有简单的方法来加密这些特定的字段?有没有一种方法只自动解密加密的数据,或者我需要编写相同的函数,但使用相反的逻辑只解密这些字段?
发布于 2021-03-27 06:43:14
我不确定这是不是解决方案是否足够快,所以如果你能提供我你的数据集的时间,将不胜感激。
此外,我没有您的原始属性,但从您的示例中,我假设您的数据只包含嵌套的字典(如果不是这样,请提供完整的示例)。
帮助器文件
data.json:其中包含未加密的json数据。{
"entities": [
{
"name": "john doe",
"personalInformation": {
"email": "john.doe@email.com",
"password": "sensitiveinformation123@"
}
},
{
"name": "jane doe",
"personalInformation": {
"email": "jane.doe@email.com",
"password": "sensitiveinformation123@"
}
}
]
}settings.json:它包含您的Fernet密钥的路径(不应该是公开的),它还包含一个字典结构,用于解释必须加密/解密哪些字段。它可以有任意数量的嵌套dictionaries.{
"pathKey": "secret.key",
"fields": {
"name": false,
"personalInformation": {
"email": true,
"password": true
}
}
}crypto_fernet.py:包含加密协议的默认函数的文件。当前仍在使用Fernet加密,如果速度不够快,则可以使用更快的加密协议创建新文件,而无需更改界面。from cryptography.fernet import Fernet
def generate_key(path='secret.key'):
""" Generates a key and save it into a file. """
key = Fernet.generate_key()
with open(path, "wb") as file:
file.write(key)
return Fernet(key)
def load_key(path='secret.key'):
""" Load a previously generated key. """
with open(path, 'rb') as file:
key = file.read()
return Fernet(key)
def encrypt(message: str, f: Fernet):
return f.encrypt(str(message).encode())
def decrypt(message: bytes, f: Fernet):
return f.decrypt(message).decode()解决方案
用于加载json文件的
def load_json(path):
""" Loads a json file. """
with open(path, 'r') as file:
data = json.load(file)
return data对字段进行编码或解码的
encode变量的。如果这是True,它将encrypt字段,如果为False,它将decrypt字段。def protect(data: Union[dict, str, bytes], fields: Union[dict, bool], f: Fernet, encode=True):
if isinstance(fields, bool) and fields is True:
return encrypt(data, f) if encode else decrypt(data, f)
if isinstance(fields, dict):
for key, value in fields.items():
data[key] = protect(data[key], value, f, encode=encode)
return data假设data和fields是匹配的字典,因此每个条目都必须相同。如果不是这样,则必须添加额外的检查,以查看data中是否包含该key。在任何情况下,请使用:
if isinstance(fields, dict):
for key, value in fields.items():
if key in data: # <-- add this line
data[key] = protect(data[key], value, f, encode=encode)第二个条件是字典中的所有条目要么是字符串,要么是字典。如果使用列表,则必须使用以下内容展开此部分
if isinstance(fields, list):
for idx in range(min(len(fields), len(data))):
data[idx] = protect(data[idx], fields[idx], f, encode=encode)import json
import os
from pprint import pprint
from typing import Union
from cryptography.fernet import Fernet
from crypto_fernet import generate_key, load_key, encrypt, decrypt
if __name__ == '__main__':
# Load data, path to the key and the information about which fields have to be encoded or decoded.
data = load_json('data.json')
settings = load_json('settings.json')
key, fields = settings['pathKey'], settings['fields']
# Generate the Fernet encryptor
if not os.path.exists(key):
f = generate_key(key)
else:
f = load_key(key)
# Protects / encrypt every entity with the fields that return True
for idx, entity in enumerate(data['entities']):
data['entities'][idx] = protect(entity, fields, f)
pprint(data, width=100)
# Unprotect / decrypt every entity with the fields that return True
for idx, entity in enumerate(data['entities']):
data['entities'][idx] = protect(entity, fields, f, encode=False)
pprint(data, width=100)输出
{'entities': [{'name': 'john doe',
'personalInformation': {'email': b'gAAAAABgXmBIwmGZsLZvHAAKhU3rZoGyIg9isfDVv0dDr_5suyTabE-e1PjQu4Bv5OIgu4SuZa11xuYixAQRTxk66jV3IceLmOLyUVRRv-_22ue7mLYHfTQ=',
'password': b'gAAAAABgXmBIwz02Edo6TYRrTGAPtTFxznpOEi3H5EjyLcwX5vRE9FJ2Af1WJNEq7tVf1hBWsKmA4aThJSUQJS8NuX7_oAVjR9FjqPfh34mKuTfb6pe2TSQ='}},
{'name': 'jane doe',
'personalInformation': {'email': b'gAAAAABgXmBIAnyAZr9UiRDArmqSfuI13_HoYKzChbf5WfejGbV7ZHZcAoNBQgyck-GTI6IXXCaBDTZDXg_RkFsDOZCqPiFuf90bC7OStzwXLzTgm0SVVNM=',
'password': b'gAAAAABgXmBILH5mJ2Eectc9u3DUyxQw1RMsH3lB3jPHLADpXNZRWBoU9FdegJiz_fa3YpjsBYsZmbgWPkmsDOJ0JFTKCTS4bncMd8rUN0t6-Zatxy_UOC8='}}]}{'entities': [{'name': 'john doe',
'personalInformation': {'email': 'john.doe@email.com',
'password': 'sensitiveinformation123@'}},
{'name': 'jane doe',
'personalInformation': {'email': 'jane.doe@email.com',
'password': 'sensitiveinformation123@'}}]}优势
settings.json只包含必须编码的变量就足够了,因为所有为false的值都将被忽略。因此,"name": false不是必需的。测试
我将嵌套字段的数量增加到大约20个,并对2000条记录(编码和解码)进行了测试。这对我来说大约需要5秒钟。这意味着它应该在您的数据集上大约为1秒(字段数量的两倍,但数据的四分之一,并且只需要编码)。
https://stackoverflow.com/questions/66818585
复制相似问题