请帮助解决以下请求:
需要将下面的df清理为df_1:'SKU‘有多个必需数据,此列需要分解为多行
df = pd.DataFrame([[1,'NaN','abj','1/1/2021'],
[2,'[{"Result":"00018"},{"Result":"00065"}]','abj','1/1/2021'],
[3,'','abj','1/1/2021']],
columns = ['ID','SKU','NOTES','Date'])df
df_1 = pd.DataFrame([['1','','abj'],
['2','00018','abj'],
['2','00065','abj'],
['3','','abj']],
columns = ['ID','SKU','NOTES'])df_1
发布于 2021-11-12 17:03:42
为了将SKU值从str转换为dictionary,我使用了json模块,如下所示:
import pandas as pd
import json
df = pd.DataFrame([
[1, 'NaN', 'abj', '1/1/2021'],
[2, '[{"Result":"00018"},{"Result":"00065"}]', 'abj', '1/1/2021'],
[3, '', 'abj', '1/1/2021']],
columns=['ID', 'SKU', 'NOTES', 'Date']
)
new_df = {
'ID': [],
'SKU': [],
'NOTES': []
}
for i, row in df.iterrows():
if (row['SKU']) in ('NaN', ''): # if row['SKU'] format is not what you want
continue
results = json.loads(row['SKU'])
for res in results:
new_df['ID'].append(row['ID'])
new_df['SKU'].append(res['Result'])
new_df['NOTES'].append(row['NOTES'])
df_1 = pd.DataFrame(new_df)
print(df_1)
# ID SKU NOTES
#0 2 00018 abj
#1 2 00065 abj发布于 2021-11-12 17:05:00
如果所有记录都遵循相同的模式,这应该会为您清除它。
请注意,下面的代码正在修改df。
import pandas as pd
import json
import numpy as np
def clean_SKU(x):
if pd.isna(x) or x == "" or x == "NaN":
return x
else:
return [ i['Result'] for i in json.loads(x)]
df.SKU = df.SKU.apply(lambda x : clean_SKU(x))
df.explode('SKU')https://stackoverflow.com/questions/69946381
复制相似问题