我有一个如下结构的数据帧:
custid province year features... label
123 AB 2005 ... 0
124 ON 2006 ... 1
...
999 QC 2012 ... 1最后一列是标签/目标。
我有一个lambda函数:
def churn_per_feature(x):
d = {}
d['churn_count'] = (x['label'] == 1).sum()
d['cust_count'] = x['custid'].nunique()
d['churn_rate'] = d['churn_count'] / float(d['cust_count'])
return pd.Series(d, index = ['churn_count', 'cust_count', 'churn_rate'])我对两个变量province和year进行了分组
churn_per_province_year = df.groupby(['province', 'year']).apply(churn_per_feature)我正在尝试pyplot,这是一个带有线条的单一图表,其中x轴是years,每一条线代表province (到目前为止,我只选择了4个客户数量最多的省份,所以它不在循环中):
plt.plot(years, churn_per_province_year[churn_per_province_year['province'] == 'ON']['cust_count'])
plt.plot(years, churn_per_province_year[churn_per_province_year['province'] == 'AB']['cust_count'])
plt.plot(years, churn_per_province_year[churn_per_province_year['province'] == 'BC']['cust_count'])
plt.plot(years, churn_per_province_year[churn_per_province_year['province'] == 'QC']['cust_count'])
plt.show()我不知道如何引用years部件。
发布于 2018-08-27 16:12:17
你想要下面这样的东西吗?
df.groupby(['year', 'province']).apply(churn_per_feature)['cust_count'].unstack().plot(legend=True)

使用matplotlib plt.plot()
churn_per_province_year = df.groupby(['year', 'province']).apply(churn_per_feature).reset_index()
#from matplotlib import pyplot
#years = range(2005, 2019) # add the right range here
plt.plot(years, churn_per_province_year[churn_per_province_year['province'] == 'ON']['cust_count'], label='ON')
plt.plot(years, churn_per_province_year[churn_per_province_year['province'] == 'AB']['cust_count'], label='AB')
plt.plot(years, churn_per_province_year[churn_per_province_year['province'] == 'QC']['cust_count'], label='QC')
plt.legend()
plt.show()

https://stackoverflow.com/questions/52034841
复制相似问题