对于所有的问题,我都有一个7点的调查数据集,我希望得到所有列中公共值的value_counts (并将数据按两列分组)。让我给您展示一个样本数据集,以及到目前为止到达的位置。
| col1 | col2 | col3 | Building | Levels_Name |
|---------------|---------------|---------------|---------------|------------------------|
| Not Satisfied | Not Satisfied | Not Satisfied | San Francisco | Individual Contributor |
| Satisfied | Satisfied | NA | Basingstoke | Individual Contributor |
| Not Satisfied | Satisfied | Not Satisfied | San Francisco | Middle Management |
| Not Satisfied | Satisfied | Not Satisfied | Miami | Senior Leadership |
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City | Senior Leadership |
| NA | NA | NA | Foster City | Other |
| Not Satisfied | Not Satisfied | NA | Foster City | Senior Leadership |
| Not Satisfied | Satisfied | Not Satisfied | Austin | Middle Management |
| Satisfied | Satisfied | Satisfied | San Francisco | Senior Leadership |
| Not Satisfied | Not Satisfied | Not Satisfied | Foster City | Individual Contributor |
| Satisfied | Satisfied | NA | Miami | Middle Management |现在,我希望将这些数据按“构建”和“Levels_Name”分组,并为“满意”、“不满意”、“NA”添加一个新的分组,并获取每个列的值计数。
因此,结果应该如下所示:
| Building | Levels_Name | Sentiment | col1 | col2 | col3 |
|---------------|------------------------|---------------|------|------|------|
| Foster City | Individual Contributor | Not Satisfied | 1 | 1 | 1 |
| Foster City | Individual Contributor | NA | 0 | 0 | 0 |
| Foster City | Individual Contributor | Satisfied | 0 | 0 | 0 |
| Foster City | Senior Leadership | Not Satisfied | 2 | 2 | 0 |
| Foster City | Senior Leadership | NA | 0 | 0 | 1 |
| Foster City | Senior Leadership | Satisfied | 0 | 0 | 0 |
| San Francisco | Individual Contributor | Not Satisfied | 1 | 1 | 1 |
| San Francisco | Individual Contributor | NA | 0 | 0 | 0 |
| San Francisco | Individual Contributor | Satisfied | 0 | 0 | 0 |谢谢!
发布于 2017-05-07 08:05:18
首先,您想要融化dataframe,然后通过
d1 = pd.melt(
df, ['Building', 'Levels_Name'], value_name='Sentiment'
).replace(np.nan, 'NaN')
d1.groupby(
d1.columns.tolist()
).size().unstack('variable', fill_value=0).reset_index()
variable Building Levels_Name Sentiment col1 col2 col3
0 Austin Middle Management Not Satisfied 1 0 1
1 Austin Middle Management Satisfied 0 1 0
2 Basingstoke Individual Contributor NaN 0 0 1
3 Basingstoke Individual Contributor Satisfied 1 1 0
4 Foster City Individual Contributor Not Satisfied 1 1 1
5 Foster City Other NaN 1 1 1
6 Foster City Senior Leadership NaN 0 0 1
7 Foster City Senior Leadership Not Satisfied 2 2 1
8 Miami Middle Management NaN 0 0 1
9 Miami Middle Management Satisfied 1 1 0
10 Miami Senior Leadership Not Satisfied 1 0 1
11 Miami Senior Leadership Satisfied 0 1 0
12 San Francisco Individual Contributor Not Satisfied 1 1 1
13 San Francisco Middle Management Not Satisfied 1 0 1
14 San Francisco Middle Management Satisfied 0 1 0
15 San Francisco Senior Leadership Satisfied 1 1 1https://stackoverflow.com/questions/43829096
复制相似问题