我有一些村庄的产量数据,样本数据请参见附件。enter image description here
Village Yield(in Kg) Date
Village1 0.22 01/06/18
Village1 0.23 02/06/18
Village1 0.55 01/06/18
Village1 0.2 02/06/18
Village2 0.88 31/05/18
Village2 0.89 30/05/18
Village2 0.63 30/05/18
Village2 0.55 30/05/18现在,我想说明,与实验date.so,village1对应的产量数据有4个产量值。
请参阅附件。enter image description here
Village Yield-1 Yield-2 Yield-3
Village1 0.22 01/06/18 0.23 02/06/18 0.55 01/06/18 任何帮助都是有帮助的。谢谢
发布于 2018-06-04 19:37:15
尝试使用groupby,然后获取其值,然后将group转换为字典,然后在字典之外创建一个数据框,然后转置它,然后使用mean为平均值创建一个新列
import pandas as pd
df = pd.DataFrame({'Village': ['Village1', 'Village1',
'Village1', 'Village1', 'Village2',
'Village2', 'Village2', 'Village2'],
'Yield (in kg)': [0.22,0.23,0.55,0.2, 0.88, 0.89, 0.63, 0.55]})
group = df.groupby('Village')['Yield (in kg)'].apply(lambda x: x.values)
df = pd.DataFrame(group.to_dict()).T
df.columns = df.columns.astype(str)
df['Average'] = df.mean(axis=1)
print(df)输出:
0 1 2 3 Average
Village1 0.22 0.23 0.55 0.20 0.3000
Village2 0.88 0.89 0.63 0.55 0.7375要重命名列,请执行以下操作:
df.columns = ['Yield (in kg)-'+i for i in df.columns if i != 'Average']输出:
Yield (in kg)-0 Yield (in kg)-1 Yield (in kg)-2 Yield (in kg)-3 /
Village1 0.22 0.23 0.55 0.20
Village2 0.88 0.89 0.63 0.55
Average
Village1 0.3000
Village2 0.7375发布于 2018-06-04 19:32:41
尝尝这个,
df.groupby(['Village']).apply(lambda x:pd.Series(zip(x['Yield(in Kg)'],x['Date']))).reset_index()
Village 0 1 2 \
0 Village1 (0.22, 01/06/18) (0.23, 02/06/18) (0.55, 01/06/18)
1 Village2 (0.88, 31/05/18) (0.89, 30/05/18) (0.63, 30/05/18)
3
0 (0.2, 02/06/18)
1 (0.55, 30/05/18)要重命名列,请执行以下操作:
col1=df.filter(regex='\d+').columns.values
col2=['Yield - '+str(col+1) for col in col1]
df.rename(columns= dict(zip(col1,col2)),inplace=True)
Village Yield - 1 Yield - 2 Yield - 3 \
0 Village1 (0.22, 01/06/18) (0.23, 02/06/18) (0.55, 01/06/18)
1 Village2 (0.88, 31/05/18) (0.89, 30/05/18) (0.63, 30/05/18)
Yield - 4
0 (0.2, 02/06/18)
1 (0.55, 30/05/18)https://stackoverflow.com/questions/50679177
复制相似问题