文章/答案/技术大牛

发布

社区首页 >问答首页 >熊猫:为特定的价值计算差距

问熊猫:为特定的价值计算差距
EN

Stack Overflow用户

提问于 2019-01-10 22:34:05

回答 3查看 306关注 0票数 1

我想要计算在数据帧中没有列出值的次数。

1 A
2 A
3 B
4 A
5 C
6 B
7 C
8 A
9 B

对于A来说，这意味着：

1-2: 0次

2-4: 1次

4-8: 3次

对于B来说，这意味着：

3-6: 2次

6-9: 2次

对于C来说，这意味着：

5-7: 1次

有什么聪明的方法来对付熊猫吗？索引实际上是时间戳，但我认为这对问题并不重要。

python

pandas

回答 3

Stack Overflow用户

回答已采纳

发布于 2019-01-10 23:12:31

假设您标记文本并将其复制到剪贴板中：

import pandas as pd
df = pd.read_clipboard(header=None)

以0和1列的dataframe结束。列0有数字，第1列有字母。

正在运行

for letter in df[1].unique():
    result = [f'{start}-{end}: {end - start -1} times' 
              for start, end in zip(
                  list(df[df[1] == letter][0]),
                  list(df[df[1] == letter][0])[1:]
                  )
             ]

    print(letter, result)

打印

A ['1-2: 0 times', '2-4: 1 times', '4-8: 3 times']
B ['3-6: 2 times', '6-9: 2 times']
C ['5-7: 1 times']

票数 1

Stack Overflow用户

发布于 2019-01-10 22:54:23

假设源数据在data.text中。

>>> import pandas as pd
>>> df = pd.read_csv('data.txt', sep=' ', names= ['index', 'blah'])
>>> df_groupby = df.groupby('blah')
>>> for key, item in df_groupby:
...     key
...     pd.cut(df.index.difference(df_groupby.get_group(key).agg('index')), range(0,10,2)).value_counts()
... 
'A'
(0, 2]    1
(2, 4]    1
(4, 6]    2
(6, 8]    1
dtype: int64
'B'
(0, 2]    1
(2, 4]    2
(4, 6]    1
(6, 8]    1
dtype: int64
'C'
(0, 2]    2
(2, 4]    1
(4, 6]    1
(6, 8]    2
dtype: int64

一步一步。

相关问题的一种解决方法可以用groupby来解决。

>>> import pandas as pd
>>> df = pd.read_csv('data.txt', sep=' ', names= ['index', 'blah'])

index blah
0      1    A
1      2    A
2      3    B
3      4    A
4      5    C
5      6    B
6      7    C
7      8    A
8      9    B

>>> df.groupby('blah').agg('index').value_counts(bins=range(0,10,2))`

blah  index        
A     (-0.001, 2.0]    2
      (2.0, 4.0]       1
      (6.0, 8.0]       1
      (4.0, 6.0]       0
B     (2.0, 4.0]       1
      (4.0, 6.0]       1
      (-0.001, 2.0]    0
      (6.0, 8.0]       0
C     (4.0, 6.0]       1
      (6.0, 8.0]       1
      (-0.001, 2.0]    0
      (2.0, 4.0]       0
Name: index, dtype: int64

要列出有的索引，它们用groupby键

>>> df_groupby = df.groupby('blah')
>>> for key, item in df_groupby:
>>>    print key, df_groupby.get_group(key).agg('index')

A Int64Index([0, 1, 3, 7], dtype='int64')
B Int64Index([2, 5, 8], dtype='int64')
C Int64Index([4, 6], dtype='int64')

可与pd.cut结合

>>> pd.cut(df_groupby.get_group('A').agg('index'), range(0,10,2)).value_counts()
(0, 2]    1
(2, 4]    1
(4, 6]    0
(6, 8]    1
dtype: int64

现在就拿这个差

>>> pd.cut(df.index.difference(df_groupby.get_group('A').agg('index')), range(0,10,2)).value_counts()

(0, 2]    1
(2, 4]    1
(4, 6]    2
(6, 8]    1
dtype: int64

票数 1

Stack Overflow用户

发布于 2019-01-10 23:42:12

与groupby核对

l=[]
for x , y in df.groupby(['1']):

    s1=y['0'].shift(1).iloc[1:].astype(str)+'-'+y['0'].iloc[1:].astype(str)
    s2=y['0'].diff().dropna()-1
    l.append(dict(zip(s1,s2)))

l
Out[351]: 
[{'1.0-2': 0.0, '2.0-4': 1.0, '4.0-8': 3.0},
 {'3.0-6': 2.0, '6.0-9': 2.0},
 {'5.0-7': 1.0}]

基本上，diff是你所需要的

df.groupby(['1'])['0'].diff().dropna()-1
Out[354]: 
1    0.0
3    1.0
5    2.0
6    1.0
7    3.0
8    2.0
Name: 0, dtype: float64

我使用for循环只是为了创建您需要的格式。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54137876

复制

相似问题

问熊猫:为特定的价值计算差距
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫:为特定的价值计算差距EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问熊猫:为特定的价值计算差距
EN