从单个dataframe(tr)中,我尝试基于一组列(Cat_col)创建多个数据格式。新的dataframe名称必须是tr_'colname‘。有人能帮我处理下面的代码吗?
for col in cat_col:
tr_ = tr[[col,'TARGET']].groupby([col,'TARGET']).size().reset_index(name='Counts')
tr_ = pivot_table(tr_,values='Counts',index=[col],columns=['TARGET'])
print tr_.shape输出:(3,2) (7,2) (8,2) (5,2) (6,2) (6,2) (18,2) (7,2) (58,2) (4,2) (3,2) (7,2)
tr[['col1','TARGET']].head(10)col1目标0无人陪伴1 1家庭0 2无人陪伴0 3无人陪伴0 4无人陪伴0 5配偶伴侣0 6无人陪伴0 7无人陪伴0 8儿童0 9无人陪伴0
tr_col1.head(3)目标0 1 col1
家庭37140 3009配偶伴侣10475 895无人陪伴228189 20337
发布于 2018-07-03 06:40:46
我认为需要:
tr = pd.DataFrame({'A':list('abcdefabcd'),
'B':list('abcdeabffe'),
'TARGET':[1,1,0,0,1,0,1,1,0,1]})
print (tr)
A B TARGET
0 a a 1
1 b b 1
2 c c 0
3 d d 0
4 e e 1
5 f a 0
6 a b 1
7 b f 1
8 c f 0
9 d e 1
cat_col = ['A','B']
d = {}
for col in cat_col:
tr_ = (tr[[col,'TARGET']].groupby([col,'TARGET'])
.size()
.unstack()
.reset_index()
.rename_axis(None, axis=1))
#some another processes if necessary
#check if outout is DataFrame
print (type(tr_))
print (tr_)
#if necessary store to dict
d[col] = tr_#select df from dict
print (d['A'])
A 0 1
0 a NaN 2.0
1 b NaN 2.0
2 c 2.0 NaN
3 d 1.0 1.0
4 e NaN 1.0
5 f 1.0 NaNhttps://stackoverflow.com/questions/51147556
复制相似问题