两者都返回每个组的第一行的DataFrame。当读取API引用时,它说首先“计算第一组值”,但是当并排查看这两个输出时,我看不出有什么大的区别。
我是不是遗漏了什么?
df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7],
'value' : ["first","second","second","first",
"second","first","third","fourth",
"fifth","second","fifth","first",
"first","second","third","fourth","fifth"]})第一API
发布于 2015-05-02 16:47:34
主要的区别是first()将跳过第一个非空值,而head(1)则不会。
如果我将np.nan放到您的示例中:
df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7],
'value' : [np.nan,"second","second","first",
"second","first","third","fourth",
"fifth","second","fifth","first",
"first","second","third","fourth","fifth"]})然后我们有:
>>> df.groupby('id').head(1)
id value
0 1 NaN # NaN is included
3 2 first
5 3 first
9 4 second
11 5 first
12 6 first
15 7 fourth
>>> df.groupby('id').first()
value
id
1 second # NaN is skipped
2 first
3 first
4 second
5 first
6 first
7 fourth(而且,正如您所看到的,head()重置索引。)
https://stackoverflow.com/questions/30004815
复制相似问题