文章/答案/技术大牛

发布

社区首页 >问答首页 >Pandas:使用另一个表的“虚拟变量”创建一个表

问Pandas:使用另一个表的“虚拟变量”创建一个表
EN

Stack Overflow用户

提问于 2020-02-19 06:04:59

回答 2查看 120关注 0票数 4

假设我有这个数据帧

DataFrame A(产品)

Cod | Product   | Cost | Date
-------------------------------
18  | Product01 | 3.4  | 21/04
22  | Product02 | 7.2  | 12/08
33  | Product03 | 8.4  | 17/01
55  | Product04 | 0.6  | 13/07
67  | Product05 | 1.1  | 09/09

DataFrame B(运营)

id | codoper | CodProd  | valor
-------------------------------
1  | 00001   | 55       | 45000
2  | 00001   | 18       | 45000
3  | 00002   | 33       | 53000
1  | 00001   | 55       | 45000

这个想法是用来自"dataframe B“的列产品获得一个"Dataframe C”：

DataFrame C结果

id | codoper | Product_18| Product_22| Product_33| Product_55| Product_67 |valor
----------------------------------------------------------------------------------
1  | 00001   | 1         | 0         | 0         | 1         | 0          |45000
2  | 00002   | 0         | 0         | 1         | 0         | 0          |53000

到目前为止，我只能通过"DataFrame B“做到这一点：

pd.get_dummies(df, columns=['CodProd']).groupby(['codoper'], as_index=False).min()

注意:我并没有在运营的Dataframe中包含Dataframe A的所有产品

谢谢

python

pandas

dataframe

回答 2

Stack Overflow用户

发布于 2020-02-19 06:54:35

您需要将来自Products的虚拟对象与来自Operations的虚拟对象组合在一起。首先使用前缀定义输出列：

columns = ['id', 'codoper'] + [f"Product_{cod}" for cod in A['Cod'].unique()] + ['valor']

然后，像上面一样使用get dummies，但是使用定义列的相同前缀。按完全共线的所有列进行分组，即id、codoper和valor。如果它们不是完全共线的，那么您需要决定如何将它们聚合到codoper级别。最后，使用之前定义的输出列重新编制索引，用零填充缺少的值。

pd.get_dummies(B, columns=['CodProd'], prefix='Product').groupby(['id', 'codoper', 'valor'], as_index=False).sum().reindex(columns=columns, fill_value=0)

  id codoper  Product_18  Product_22  Product_33  Product_55  Product_67  valor
0  1   00001           0           0           0           2           0  45000
1  2   00001           1           0           0           0           0  45000
2  3   00002           0           0           1           0           0  53000

票数 2

Stack Overflow用户

发布于 2020-02-19 06:53:10

这是merge和pivot_table的组合，并做了一些调整：

(Products.merge(Operations, 
                left_on='Cod', 
                right_on='CodProd',
                how='left')
     .pivot_table(index=['codoper','valor'],
                  values='Product',
                  columns='Cod', 
                  fill_value=0,
                  aggfunc='any')
     .reindex(Products.Cod.unique(), 
              fill_value=False,
              axis=1)
     .astype(int)
     .add_prefix('Product_')
     .reset_index()
)

输出：

Cod codoper    valor  Product_18  Product_22  Product_33  Product_55  \
0     00001  45000.0           1           0           0           1   
1     00002  53000.0           0           0           1           0   

Cod  Product_67  
0             0  
1             0

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60290110

复制

相似问题

问Pandas:使用另一个表的“虚拟变量”创建一个表
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Pandas:使用另一个表的“虚拟变量”创建一个表EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Pandas:使用另一个表的“虚拟变量”创建一个表
EN