我只是尝试让数据的分位数被签署到另一个dataframe上,比如:
dataframe['pc'] = dataframe['row'].quantile([.1,.5,.7])
结果是0 NaN ... 5758 NaN Name: pc, Length: 5759, dtype: float64
你知道为什么dataframe['row']有很多价值吗?
发布于 2018-02-01 13:12:18
这是预期的,因为不同的索引,所以没有使Series由quantile创建的原始DataFrame和获取NaN的:
#indices 0,1,2...6
dataframe = pd.DataFrame({'row':[2,0,8,1,7,4,5]})
print (dataframe)
row
0 2
1 0
2 8
3 1
4 7
5 4
6 5
#indices 0.1, 0.5, 0.7
print (dataframe['row'].quantile([.1,.5,.7]))
0.1 0.6
0.5 4.0
0.7 5.4
Name: row, dtype: float64
#not align
dataframe['pc'] = dataframe['row'].quantile([.1,.5,.7])
print (dataframe)
row pc
0 2 NaN
1 0 NaN
2 8 NaN
3 1 NaN
4 7 NaN
5 4 NaN
6 5 NaN如果要创建DataFrame of quantile添加rename_axis + reset_index
df = dataframe['row'].quantile([.1,.5,.7]).rename_axis('a').reset_index(name='b')
print (df)
a b
0 0.1 0.6
1 0.5 4.0
2 0.7 5.4但如果某些指数是相同的(我认为这不是你想要的,只是为了更好的解释):
为默认索引添加reset_index 0,1,2
print (dataframe['row'].quantile([.1,.5,.7]).reset_index(drop=True))
0 0.6
1 4.0
2 5.4
Name: row, dtype: float64前3行对齐,因为相同的索引0,1,2在Series和DataFrame中
dataframe['pc'] = dataframe['row'].quantile([.1,.5,.7]).reset_index(drop=True)
print (dataframe)
row pc
0 2 0.6
1 0 4.0
2 8 5.4
3 1 NaN
4 7 NaN
5 4 NaN
6 5 NaN编辑:对于需要DataFrame.quantile的多列,它还排除了非数字列:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4],
'F':list('aaabbb')})
print (df)
A B C D E F
0 a 4 7 1 5 a
1 b 5 8 3 3 a
2 c 4 9 5 6 a
3 d 5 4 7 9 b
4 e 5 2 1 2 b
5 f 4 3 0 4 b
df1 = df.quantile([.1,.2,.3,.4])
print (df1)
B C D E
0.1 4.0 2.5 0.5 2.5
0.2 4.0 3.0 1.0 3.0
0.3 4.0 3.5 1.0 3.5
0.4 4.0 4.0 1.0 4.0https://stackoverflow.com/questions/48563532
复制相似问题