文章/答案/技术大牛

发布

社区首页 >问答首页 >Python Pandas:如何找到组合模式(组合的组合)-时间序列

问Python Pandas:如何找到组合模式(组合的组合)-时间序列
EN

Stack Overflow用户

提问于 2021-02-07 20:49:10

回答 1查看 113关注 0票数 0

从这里开始：unique combinations of values in selected columns in pandas data frame and count

我在下面的代码中找到了3列出现次数最多到最少的组合：

def common_cols(df,n):
    '''n is how many of the top results to show'''

    df = df.groupby(['A','B','C']).size().reset_index().rename(columns={0:'count'})

    df = df.sort_values(by='count', ascending=False).reset_index(drop=True).head(n)

    return df

common_data = common_cols(df,10)

Common_data的输出(显示了前10个结果)：

      A     B       C      count
0    0.00  0.00    0.00     96
1    0.00  1.00    0.00     25
2    0.14  0.86    0.00     19
3    0.13  0.87    0.00     17
4    0.00  0.72    0.28     17
5    0.00  0.89    0.11     16
6    0.01  0.84    0.15     16
7    0.03  0.97    0.00     15
8    0.35  0.65    0.00     15
9    0.13  0.79    0.08     14

现在，我想找出A、B、C行的组合，并计算它们出现的次数。

例如，假设在从第1行到第4行的基础df中：

这3列的第一组组合(在使用common_cols函数之前由dataframe(df)告知)是

# each of these rows are their own combination of values
       A    B     C
0    0.67  0.16  0.17
1    0.06  0.73  0.20
2    0.19  0.48  0.33
3    0.07  0.87  0.06
4    0.07  0.60  0.33

以上5行(按顺序)将被计为组合模式。它可以被计算为2行、3行、4行或更多行的组合(如果这样做足够简单的话！)

如果该模式被发现一次(在整个数据帧中)，它将输出该模式的计数为1。如果它被发现10次，则计数将为10。

关于如何计算连续行之间的组合，您有什么想法吗？像使用common_cols函数一样，但是作为“组合的组合”？

行必须按顺序排列，它才能成为模式。任何帮助都是非常感谢的！

dataframe

combinations

permutation

python

pandas

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-02-08 06:27:47

我对这个测试数据帧使用了整数，但是如果你的groupby在上面工作，这也应该适用于你的数据：

df_size = 1000000
df = pd.DataFrame( { 'A' : (np.random.randint(20) for i in range(df_size)),
                     'B' : (np.random.randint(20) for i in range(df_size)),
                     'C' : (np.random.randint(20) for i in range(df_size)),
            })

print(df.head())
    A   B   C
0  12  12   5
1  19  12  12
2  14  11  15
3  11  14   8
4  13  16   2

下面的代码使用zip创建了一个名为source的三元组(A，B，C)列表。与[source[0:], source[1:], source[2:]...]一样，tmp变量(生成器)实际上是一个保存源列表的连续“移位”副本的列表

最后，zip在tmp中交错来自列表的值，例如，对于n=2，它将生成[(source[0], source[1]), (source[1], source[2]), ... ]的列表

source = list(zip(df['A'],df['B'],df['C']))
n_consecutive = 3

tmp = ( source[i:] for i in range(n_consecutive) )
output = pd.Series(list(zip(*tmp)))

对于此示例，这是一个包含三元组(A，B，C)值计数的序列：

print(output.value_counts().head())
((6, 19, 14), (19, 12, 6), (13, 7, 10))    2
((2, 18, 12), (17, 2, 19), (7, 19, 19))    1
((10, 2, 3), (1, 18, 8), (3, 6, 19))       1
((16, 15, 14), (11, 2, 9), (14, 14, 8))    1
((3, 3, 7), (13, 9, 3), (18, 15, 6))       1
dtype: int64

请注意，根据您要查找的内容，这可能会重复计数。例如，如果基础df在一行中有三个记录，并且您正在寻找连续2个记录的模式：

(1, 3, 4)
(1, 3, 4)
(1, 3, 4)

在这种情况下，它将找到(1, 3, 4), (1, 3, 4)两次。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66088117

复制

相似问题

问Python Pandas:如何找到组合模式(组合的组合)-时间序列
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python Pandas:如何找到组合模式(组合的组合)-时间序列EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python Pandas:如何找到组合模式(组合的组合)-时间序列
EN