不是一个理想的标题,但我不知道如何更好地描述它。
我有一个dataframe (df1),并希望将它拆分到“鸡肉”栏中,以便:
我需要的输出是df2,例如:

在农场"A“中,有5只鸡,其中2只产下了一个鸡蛋,因此有2排有鸡蛋=”真“,重量=1只,1排有鸡蛋=”假“,重量=3(3只没有下蛋的鸡)。
我想出的代码很混乱,你们能想出一种更干净的方法吗?谢谢!!
#code to create df1:
df1 = pd.DataFrame({'farm':["A","B","C"],"chicken":[5,10,5],"eggs":[2,3,0]})
df1=df1[["farm","chicken","eggs"]]
#code to transform df1 to df2:
df2 = pd.DataFrame()
for i in df1.index:
number_of_trues = df1.iloc[i]["eggs"]
number_of_falses = df1.iloc[i]["chicken"] - number_of_trues
col_farm = [df1.iloc[i]["farm"]]*(number_of_trues+1)
col_egg = ["True"]*number_of_trues + ["False"]*1
col_weight = [1]*number_of_trues + [number_of_falses]
mini_df = pd.DataFrame({"farm":col_farm,"egg":col_egg,"weight":col_weight})
df2=df2.append(mini_df)
df2 = df2[["farm","egg","weight"]]
df2发布于 2018-04-17 18:56:29
这是自定义解决方案,方法是创建两个不同的子数据,然后将其concat返回,以实现预期的output.Key方法:repeat
s=pd.DataFrame({'farm':df1.farm.repeat(df1.eggs),'egg':[True]*df1.eggs.sum(),'weight':[1]*df1.eggs.sum()})
t=pd.DataFrame({'farm':df1.farm,'egg':[False]*len(df1.farm),'weight':df1.chicken-df1.eggs})
pd.concat([t,s]).sort_values(['farm','egg'],ascending=[True,False])
Out[847]:
egg farm weight
0 True A 1
0 True A 1
0 False A 3
1 True B 1
1 True B 1
1 True B 1
1 False B 7
2 False C 5https://stackoverflow.com/questions/49885411
复制相似问题