文章/答案/技术大牛

发布

问基于条件拆分DataFrame
EN

Stack Overflow用户

提问于 2022-07-11 15:12:39

回答 1查看 82关注 0票数 0

   df = {'Source': ['-23456','-23456','3456','','56789','-12456','-13245','','45678','12346','','-23456','-23456','-234556','124566','098745','-67890'],
      'number_mp': [369452,369452,369452,369452,369452,369452,369452,369452,369452,369452,369452,369452,369452,369452,369452,369452,369452],
      'time_Utc': ["2014-9-12","2014-9-12","2014-9-12" ,"2017-5-14","2017-5-14","2017-5-14","2016-10-26" ,"2016-10-26" ,"2016-10-26" ,"2016-11-3" ,"2016-11-3" ,"2016-8-10" ,"2016-8-10","2016-8-10","2014-9-12","2014-9-12","2014-9-12"]}

df = pd.DataFrame(df)

我有一个Dataframe，我想把它划分为子Dataframes，条件是每个数据被分组为三个不同的"time_Utc“值，从最小到最大。

我使用了以下方法，但是具有相同日期的值是分组的，我不知道如何将它们具有不同日期的条件放在一起。

res = []
for _ in range(len(df)):
    tabla = df.sample(n=3)
    res.append(tabla)
print(res)

有人能帮帮我吗。

python

dataframe

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-07-13 22:52:08

我仍然不能完全理解你的目标，但我有个建议：

首先，将索引分组为块，每个块包含属于同一日期的索引，并按日期排序：

from datetime import datetime
from itertools import combinations, product

def key(group): return datetime.strptime(group[0], "%Y-%m-%d")
idx_blocks = [sdf.index for _, sdf in sorted(df.groupby("time_Utc"), key=key)]

结果(idx_blocks)：

[Int64Index([0, 1, 2, 14, 15, 16], dtype='int64'),
 Int64Index([11, 12, 13], dtype='int64'),
 Int64Index([6, 7, 8], dtype='int64'),
 Int64Index([9, 10], dtype='int64'),
 Int64Index([3, 4, 5], dtype='int64')]

然后使用combinations选择三个索引块的所有组合(它们保持排序)，使用product从它们中选择所有可能的索引三元组，并收集相应的子数据文件：

samples = [
    df.loc[idx, :]
    for blocks in combinations(idx_blocks, 3)
    for idx in product(*blocks)
]

您将得到如下示例数据的列表：

    Source  number_mp    time_Utc
0   -23456     369452   2014-9-12
11  -23456     369452   2016-8-10
6   -13245     369452  2016-10-26

    Source  number_mp    time_Utc
0   -23456     369452   2014-9-12
11  -23456     369452   2016-8-10
7              369452  2016-10-26

...

   Source  number_mp    time_Utc
8   45678     369452  2016-10-26
10            369452   2016-11-3
4   56789     369452   2017-5-14

    Source  number_mp    time_Utc
8    45678     369452  2016-10-26
10             369452   2016-11-3
5   -12456     369452   2017-5-14

但是要注意，这些样本的数量可能很快就会失控。你提供的小样本数据已经导致351个样本。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72940772

复制

相似问题

问基于条件拆分DataFrame
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问基于条件拆分DataFrameEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问基于条件拆分DataFrame
EN