我有一只熊猫,它有三栏Lot Number,Price和Image Id。我必须用以下格式创建一个JSON文件

{'1200-1300':{'LOT3551': [9082327, 9082329],
'LOT3293':[982832, 898762, 887654]
},
'1300-1400': {'LOT2219': [776542, 119234]
}
}字典中的第一级键,即“1200-1300”、“1300-1400”等是价格范围。价格范围内的键是属于价格范围内的批号,它们的值是来自Image列的值。
到目前为止,我已经尝试了以下代码
for idx, gid_list in enumerate(df['AV Gid']):
data = df.iloc[idx]
lot_no = data['Lot Number']
price = data['Final Price']
gids = gid_list.replace("[","").replace("]","").split(",")
if price >= 1000 and price < 1100:
pr = '10-11'
elif price >= 1100 and price < 1200:
pr = '11-12'
else:
continue
print(pr)
if lot_no in sample_dict[pr]:
sampe_dict[pr][lot_no].append(gid)
else:
#print(pr)
sample_dict[pr][lot_no] = []其中,sample_dict有键作为价格范围。上述代码的问题在于,它还填充了其他价格范围键。
发布于 2022-05-12 10:19:17
我会做这样的事
price_ranges = {'10-11': [1000, 1099], '11-12': [1100, 1199], '0-10': [0, 999]}
sample_dict = dict.fromkeys(price_ranges.keys(), {})
def look_for_range(price, price_ranges=price_ranges):
for label, (low, high) in price_ranges.items():
if low <= price <= high:
return label
def compose_range_dict(row, sample_dict = sample_dict):
range_label = look_for_range(row['PRICE'])
if range_label is not None:
sample_dict[range_label].update({row['LOTNUMBER']: row['IMAGE_ID']})然后
import pandas as pd
# dictionary of lists
testdict = {'LOTNUMBER':['LOT3551', 'LOT3520', 'LOT3574', 'LOT3572'],
'PRICE': [1250, 1150, 10, 900],
'IMAGE_ID':[[9082327, 9082328, 9082329],
[9081865, 9081866, 9081867],
[9083230, 9083231, 9083232],
[9082985, 9082986, 9082988]]}
testdf = pd.DataFrame(testdict)
testdf.apply(compose_range_dict, axis = 1)
# >>> sample_dict
# {'10-11': {'LOT3520': [9081865, 9081866, 9081867], 'LOT3574': [9083230, 9083231, 9083232], 'LOT3572': [9082985, 9082986, 9082988]},
# '11-12': {'LOT3520': [9081865, 9081866, 9081867], 'LOT3574': [9083230, 9083231, 9083232], 'LOT3572': [9082985, 9082986, 9082988]},
# '0-10': {'LOT3520': [9081865, 9081866, 9081867], 'LOT3574': [9083230, 9083231, 9083232], 'LOT3572': [9082985, 9082986, 9082988]}}发布于 2022-05-12 11:35:15
如果df是您的数据,您可以尝试:
data = {
f"{p}-{p + 100}": ser.to_dict()
for p, ser in df.assign(Price=df["Price"].floordiv(100).mul(100))
.set_index("Lot Number")
.groupby("Price")["Image Id"]
}index
Price替换为.floordiv(100).mul(100)等效的
Lot Number以列Price作为结果数据,获取列Image Id作为序列,并将结果放入字典中:F 114H 115字符串f"{p}-{p + 100}"作为键(d17是组的最高价格),以及H 218H 119转换为作为values
的字典的组序列。
结果为
data = {"Lot Number":["LOT1", "LOT2", "LOT3", "LOT4", "LOT5", "LOT6"],
"Price": [1200, 1250, 10, 20, 30, 1300],
"Image Id": [list(range(n)) for n in range(1, 7)]}
df = pd.DataFrame(data) Lot Number Price Image Id
0 LOT1 1200 [0]
1 LOT2 1250 [0, 1]
2 LOT3 10 [0, 1, 2]
3 LOT4 20 [0, 1, 2, 3]
4 LOT5 30 [0, 1, 2, 3, 4]
5 LOT6 1300 [0, 1, 2, 3, 4, 5]是
{'0-100': {'LOT3': [0, 1, 2], 'LOT4': [0, 1, 2, 3], 'LOT5': [0, 1, 2, 3, 4]},
'1200-1300': {'LOT1': [0], 'LOT2': [0, 1]},
'1300-1400': {'LOT6': [0, 1, 2, 3, 4, 5]}}你可以在一次潘达斯行动中做同样的事情:
data = (
df.assign(
Price=df["Price"].floordiv(100).mul(100).map(lambda p: f"{p}-{p + 100}")
)
.set_index("Lot Number")
.groupby("Price")["Image Id"]
.agg(dict)
.to_dict()
)https://stackoverflow.com/questions/72212316
复制相似问题