我有如下代码,我需要status和策略工作流名称(即星期二),以及在输出中包含以下两个条件-
1-如果clientHostname与xyz匹配,请删除第2行-如果状态中止,则将其更改为“失败”
import json
import requests
import pandas as pd
from pandas.io.json import json_normalize
import ast
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 100)
a = [{'attributes': [{'key': '*policy action jobid', 'values': ['289903']}, {'key': '*policy action name', 'values': ['backup']}, {'key': '*policy name', 'values': ['Daily_Backups']}, {'key': '*policy workflow name', 'values': ['tuesday']}, {'key': 'clone retention policy', 'values': [' 504: 5: 34']}, {'key': 'group', 'values': ['tuesday']}, {'key': 'saveset features', 'values': ['CLIENT_SAVETIME']}], 'browseTime': '2020-05-19T23:57:41+08:00', 'clientHostname': 'xyz.com', 'clientId': '7d391c52-00000004-5cda459d-5c1', 'creationTime': '2020-04-28T21:29:25+08:00', 'fileCount': 0, 'id': '1eb1', 'instances': [], 'level': 'Full', 'links': [{'href': 'https://iservera/backups/1ec1', 'rel': 'item'}], 'name': '/abc', 'retentionTime': '2020-05-19T23:57:41+08:00', 'saveTime': '2020-04-28T21:27:07+08:00', 'shortId': '2177', 'size': {'unit': 'Byte', 'value': 0}, 'type': 'File'}, {'attributes': [{'key': '*policy action jobid', 'values': ['2803']}, {'key': '*policy action name', 'values': ['backup: 1589']}, {'key': '*policy name', 'values': ['Daily_Backups: 159']}, {'key': '*policy workflow name', 'values': ['tuesday: 1588079529']}, {'key': '*ss clone retention', 'values': [' 1588079529: 1588079590: 1824409']}, {'key': 'group', 'values': ['tuesday']}, {'key': 'saveset features', 'values': ['CLIENT_SAVETIME']}], 'browseTime': '2020-05-19T23:57:42+08:00', 'clientHostname': 'abc.com', 'clientId': 'ec3dc1', 'completionTime': '2020-04-28T21:29:47+08:00', 'creationTime': '2020-04-28T21:13:10+08:00', 'fileCount': 0, 'id': 'cc1', 'instances': [{'clone': False, 'id': '1588079529', 'status': 'Aborted', 'volumeIds': ['245614341']}], 'level': 'Full', 'links': [{'href': 'https://abc/backups/c771', 'rel': 'item'}], 'name': '/xyz', 'retentionTime': '2020-05-19T23:57:42+08:00', 'saveTime': '2020-04-28T21:10:53+08:00', 'shortId': '2141727718', 'size': {'unit': 'Byte', 'value': 36264099844}, 'type': 'NDMP'}]
df = json_normalize(a)
a = df[['clientHostname','completionTime','size.value','type','fileCount']]
print(a)目前的产出是:
clientHostname completionTime size.value type fileCount
0 xyz.com NaN 0 File 0
1 abc.com 2020-04-28T21:29:47+08:00 36264099844 NDMP 0预期产出如下:
clientHostname completionTime size.value type fileCount status Policy
1 abc.com 2020-04-28T21:29:47+08:00 36264099844 NDMP 0 [Failed] tuesday发布于 2020-04-30 08:52:02
我将使用库jmespath遍历json数据:
若要访问密钥,请使用.;若要访问列表,请使用[]符号。
import jmespath
expression = jmespath.compile("""
[].
{clientHostname:clientHostname,
completionTime:completionTime,
"size.value":size.value,
type:type,
fileCount:fileCount,
status:instances[].status,
Policy:attributes[?key==`*policy workflow name`].values[]}
""")
res = expression.search(a)
res
[{'clientHostname': 'xyz.com',
'completionTime': None,
'size.value': 0,
'type': 'File',
'fileCount': 0,
'status': [],
'Policy': ['tuesday']},
{'clientHostname': 'abc.com',
'completionTime': '2020-04-28T21:29:47+08:00',
'size.value': 36264099844,
'type': 'NDMP',
'fileCount': 0,
'status': ['Aborted'],
'Policy': ['tuesday: 1588079529']}]进行一些清理以适应您的用例:
new = []
for entry in res:
if "xyz" in entry['clientHostname'] :
continue
new.append(entry)
for ent in new:
ent['Policy'] = ent['Policy'][0].split(':')[0]
if ent['status'] == ["Aborted"]:
ent['status'] = ["Failed"]
new
[{'clientHostname': 'abc.com',
'completionTime': '2020-04-28T21:29:47+08:00',
'size.value': 36264099844,
'type': 'NDMP',
'fileCount': 0,
'status': ['Failed'],
'Policy': 'tuesday'}]
pd.DataFrame(new)
clientHostname completionTime size.value type fileCount status Policy
0 abc.com 2020-04-28T21:29:47+08:00 36264099844 NDMP 0 [Failed] tuesday您可能需要查看您的数据,看看是否还有您需要进行的其他转换。
发布于 2020-04-30 08:47:01
您将能够将策略工作流名作为df['attributes'][0][3]['key'],df['attributes'][0][3]['values']和status作为df['instances'][**i**][0]['status'],其中i是记录号。
尝试如下:
import json
import requests
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 100)
a = [{'attributes': [{'key': '*policy action jobid', 'values': ['289903']}, {'key': '*policy action name', 'values': ['backup']}, {'key': '*policy name', 'values': ['Daily_Backups']}, {'key': '*policy workflow name', 'values': ['tuesday']}, {'key': 'clone retention policy', 'values': [' 504: 5: 34']}, {'key': 'group', 'values': ['tuesday']}, {'key': 'saveset features', 'values': ['CLIENT_SAVETIME']}], 'browseTime': '2020-05-19T23:57:41+08:00', 'clientHostname': 'xyz.com', 'clientId': '7d391c52-00000004-5cda459d-5c1', 'creationTime': '2020-04-28T21:29:25+08:00', 'fileCount': 0, 'id': '1eb1', 'instances': [], 'level': 'Full', 'links': [{'href': 'https://iservera/backups/1ec1', 'rel': 'item'}], 'name': '/abc', 'retentionTime': '2020-05-19T23:57:41+08:00', 'saveTime': '2020-04-28T21:27:07+08:00', 'shortId': '2177', 'size': {'unit': 'Byte', 'value': 0}, 'type': 'File'}, {'attributes': [{'key': '*policy action jobid', 'values': ['2803']}, {'key': '*policy action name', 'values': ['backup: 1589']}, {'key': '*policy name', 'values': ['Daily_Backups: 159']}, {'key': '*policy workflow name', 'values': ['tuesday: 1588079529']}, {'key': '*ss clone retention', 'values': [' 1588079529: 1588079590: 1824409']}, {'key': 'group', 'values': ['tuesday']}, {'key': 'saveset features', 'values': ['CLIENT_SAVETIME']}], 'browseTime': '2020-05-19T23:57:42+08:00', 'clientHostname': 'abc.com', 'clientId': 'ec3dc1', 'completionTime': '2020-04-28T21:29:47+08:00', 'creationTime': '2020-04-28T21:13:10+08:00', 'fileCount': 0, 'id': 'cc1', 'instances': [{'clone': False, 'id': '1588079529', 'status': 'Aborted', 'volumeIds': ['245614341']}], 'level': 'Full', 'links': [{'href': 'https://abc/backups/c771', 'rel': 'item'}], 'name': '/xyz', 'retentionTime': '2020-05-19T23:57:42+08:00', 'saveTime': '2020-04-28T21:10:53+08:00', 'shortId': '2141727718', 'size': {'unit': 'Byte', 'value': 36264099844}, 'type': 'NDMP'}]
df = pd.json_normalize(a)
status = []
policy = []
for attribute in df['attributes']:
policy.append(attribute[3]['values'])
for instance in df['instances']:
if len(instance) == 0:
status.append('-')
else:
for i in instance:
status.append(i['status'])
a = df[['clientHostname','completionTime','size.value','type','fileCount']]
a.insert(5, 'policy', policy)
a.insert(6, 'status', status)另外,不推荐使用pandas.io.json.json_normalize,而是使用pandas.json_normalize
发布于 2020-04-30 14:24:46
假设属性和实例字典的格式在整个dataframe中保持不变:
a = raw_json
df_temp = json_normalize(a)创建状态和策略工作流名称的列表
statuses = [i[0]['status']if len(i)>0 else np.nan for i in df_temp['instances']]
policy_workflow_names = [i[3]['values'] if len(i)>0 else np.nan for i in df_temp['attributes']]重要的是,属性和实例列中的值的格式保持不变,否则将无法工作。
一旦你有了这两个列表,就把它们放到你最后的数据中。
a = df[['clientHostname','completionTime','size.value','type','fileCount']]
a['policy workflow name'] = policy_workflow_names
a['statuses'] = statuseshttps://stackoverflow.com/questions/61518878
复制相似问题