我正在尝试用Pandas计算Datafame的值。
但是我不会计算"FW_result“和"SS_result”。
df1:
category num standard
FW U-1 3
FW U-2 3
FW U-3 2
SS U-4 3
SS U-5 2
df2:
name U-1 U-2 U-3 U-4 U-5
A 3 1 1 2 1
B 2 3 2 1 2
C 2 2 2 3 1
The desired result would be:
name U-1 U-2 U-3 U-4 U-5 FW_result SS_Result
A 3 1 1 2 1 *(63%) #(60%)
B 2 3 2 1 2 **(88%) ##(60%)
C 2 2 2 3 1 ***(75%) ###(80%)我想填充*和#值。FW_result指的是df1 So,(df2 value)/(standard sum)
示例*
(3+1+1)/(3+3+2) =>63%示例#:
(2+1) /(3+2) => 60%并且我想将结果表示为%(%)
发布于 2020-08-18 15:46:31
如果nums在类别上没有重叠,你可以使用这个。
您可以在这里使用df.groupby,然后使用use GroupBy.sum,然后使用df.div除以这些值并乘以100,以使用df.mul获得百分比值,从而获得所需的结果。
g = df1.groupby('category')['standard'].sum()
m = df2.groupby(df2.columns.map(df1.set_index('num')['category']),axis=1).sum()
# Mapping to convert U-1 to FW, U-2 to FW based on df1
# If you know it's always the order as df1 then you can simply
#_.groupby.(df1['category'].tolist(), axis=1)
df2[['FW_res', 'SS_res']] = m.div(g).mul(100)如果它们重叠,那么试试这个。
df1
# category num standard
#0 FW U-1 3
#1 FW U-2 3
#2 FW U-3 2
#3 SS U-4 3
#4 SS U-5 2
#5 SS U-1 4
p = df1.pivot_table(index='category', columns='num', values='standard', aggfunc='sum')
p
#num U-1 U-2 U-3 U-4 U-5
#category
#FW 3.0 3.0 2.0 NaN NaN
#SS 4.0 NaN NaN 3.0 2.0
# Now, little bit of Numpy boardcasting.
#df2.se_index('name') # if index is not name else ignore this step.
vs = np.nansum(df2.values[:,None]+p.values, axis=-1)
v = p.sum(axis=1).values
out = (vs - v) / v
df2[['FW_res', 'SS_res']] = pd.DataFrame(out*100, index=df2.index)发布于 2020-08-18 15:32:16
对于具有两个新列的num中的重叠值的一般解决方案,请使用:
思路是对fw和ss值过滤df1,然后使用DataFrame.reindex by num values,对原始standard列求和并除以和:
df3 = df1.groupby('category').agg({'num':list, 'standard':'sum'})
print (df3)
num standard
category
FW [U-1, U-2, U-3] 8
SS [U-4, U-5] 5
for col in df3.index:
df2[f'{col}_result'] = (df2.reindex(df3.loc[col, 'num'], axis=1).sum(axis=1)
.div(df3.loc[col, 'standard']).mul(100))
print (df2)
name U-1 U-2 U-3 U-4 U-5 FW_result SS_result
0 A 3 1 1 2 1 62.5 60.0
1 B 2 3 2 1 2 87.5 60.0
2 C 2 2 2 3 1 75.0 80.0发布于 2020-08-18 15:36:44
您可以尝试组合使用melt、merge和groupby对数据进行整形并获得总和:
df2[['FW_result', 'SS_result']] = (df1.merge(df2.melt("name", var_name="num"), on="num")
.assign(num=lambda x: x.num.str[0])
.groupby(["name", "category", "num"])
.pipe(lambda x: x.value.sum() / x.standard.sum())
.mul(100)
.unstack("category").to_numpy())
name U-1 U-2 U-3 U-4 U-5 FW_result SS_result
0 A 3 1 1 2 1 62.5 60.0
1 B 2 3 2 1 2 87.5 60.0
2 C 2 2 2 3 1 75.0 80.0https://stackoverflow.com/questions/63463555
复制相似问题