我的数据框架中有两列:
winner opening_shortname
0 White Slav Defense
1 Black Nimzowitsch Defense
2 White King's Pawn Game
3 White Queen's Pawn Game
4 White Philidor Defense
... ... ...
20053 White Dutch Defense
20054 Black Queen's Pawn
20055 White Queen's Pawn Game
20056 White Pirc Defense
20057 Black Queen's Pawn Game我想创造下面的情节,前十的开场白和它的得奖色彩比例(%)。

发布于 2021-08-15 15:53:26
topk = 10
z = df.groupby(['opening_shortname', 'winner']).size().unstack()
ax = z.loc[z.sum(1).sort_values().tail(topk).index].plot.barh(color=['black', 'white'], edgecolor='black')
ax.xaxis.set_visible(False)这取决于开放的流行程度和对顶级k的限制(例如OP问题中的10 )。问题中提到的“比例(%)”是模棱两可的:所提供的地块清楚地显示从顶部开口到下一个开口的下降总数,并且水平轴被移除。
无论如何,在您提供的样本数据上:

发布于 2021-08-15 15:46:18
假设您的dataframe是名称groupby+count+unstack. df,则可以使用然后对总数进行排序,然后用前10位来绘制:
df2 = (df.assign(count=1)
.groupby(['winner', 'opening_shortname'])
.count()
.unstack(level=0)
.droplevel(0, axis=1)
)
# plot part
idx = df2.sum(axis=1).sort_values().head(10).index
(df2.div(df2.sum(axis=1), axis=0) # calculate the proportion
.fillna(0)
.loc[idx, ['White', 'Black']]
.plot.barh(color=['w', 'k'], edgecolor='k')
)产出:

发布于 2021-08-15 15:50:22
首先,您应该通过以下方法重新调整数据格式:
df = df.groupby(by = ['opening_shortname', 'winner']).size().reset_index().rename(columns = {'opening_shortname': 'opening_shortname', 'winner': 'winner', 0: 'count'}).sort_values(['count', 'opening_shortname', 'winner'], ascending = False, ignore_index = True)因此,您将得到一个类似于(假数据)的数据:
opening_shortname winner count
0 Queen's Pawn Game White 141
1 Queen's Pawn Game Black 132
2 Queen's Pawn White 57
3 Queen's Pawn Black 57
4 King's Pawn Game Black 57
5 Dutch Defense Black 53
6 Sicilian Defense White 51
7 Sicilian Defense Black 50
8 Nimzowitsch Defense White 46
9 Nimzowitsch Defense Black 45
10 Philidor Defense Black 44
11 Slav Defense White 43
12 Pirc Defense White 42
13 Slav Defense Black 39
14 Pirc Defense Black 38
15 King's Pawn Game White 38
16 Dutch Defense White 36
17 Philidor Defense White 31然后,您可以绘制数据,例如使用seaborn.barplot
sns.barplot(ax = ax, data = df, x = 'count', y = 'opening_shortname', hue = 'winner', palette = ['white', 'black'], edgecolor = 'black')完整代码
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv(r'data/data.csv')
df = df.groupby(by = ['opening_shortname', 'winner']).size().reset_index().rename(columns = {'opening_shortname': 'opening_shortname', 'winner': 'winner', 0: 'count'}).sort_values(['count', 'opening_shortname', 'winner'], ascending = False, ignore_index = True)
fig, ax = plt.subplots()
sns.barplot(ax = ax, data = df, x = 'count', y = 'opening_shortname', hue = 'winner', palette = ['white', 'black'], edgecolor = 'black')
plt.show()

如果要绘制相对比例以代替计数,则可以在上面的代码中添加一行:
df['count'] = df['count']/df.groupby('opening_shortname')['count'].transform('sum')完整代码
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv(r'data/data.csv')
df = df.groupby(by = ['opening_shortname', 'winner']).size().reset_index().rename(columns = {'opening_shortname': 'opening_shortname', 'winner': 'winner', 0: 'count'}).sort_values(['count', 'opening_shortname', 'winner'], ascending = False, ignore_index = True)
df['count'] = df['count']/df.groupby('opening_shortname')['count'].transform('sum')
fig, ax = plt.subplots()
sns.barplot(ax = ax, data = df, x = 'count', y = 'opening_shortname', hue = 'winner', palette = ['white', 'black'], edgecolor = 'black')
plt.show()

https://stackoverflow.com/questions/68792609
复制相似问题