df = {'Source': ['-23456','-23456','3456','','56789','-12456','-13245','','45678','12346','','-23456','-23456','-234556','124566','098745','-67890'],
'number_mp': [369452,369452,369452,369452,369452,369452,369452,369452,369452,369452,369452,369452,369452,369452,369452,369452,369452],
'time_Utc': ["2014-9-12","2014-9-12","2014-9-12" ,"2017-5-14","2017-5-14","2017-5-14","2016-10-26" ,"2016-10-26" ,"2016-10-26" ,"2016-11-3" ,"2016-11-3" ,"2016-8-10" ,"2016-8-10","2016-8-10","2014-9-12","2014-9-12","2014-9-12"]}
df = pd.DataFrame(df)我有一个Dataframe,我想把它划分为子Dataframes,条件是每个数据被分组为三个不同的"time_Utc“值,从最小到最大。
我使用了以下方法,但是具有相同日期的值是分组的,我不知道如何将它们具有不同日期的条件放在一起。
res = []
for _ in range(len(df)):
tabla = df.sample(n=3)
res.append(tabla)
print(res) 有人能帮帮我吗。
发布于 2022-07-13 22:52:08
我仍然不能完全理解你的目标,但我有个建议:
首先,将索引分组为块,每个块包含属于同一日期的索引,并按日期排序:
from datetime import datetime
from itertools import combinations, product
def key(group): return datetime.strptime(group[0], "%Y-%m-%d")
idx_blocks = [sdf.index for _, sdf in sorted(df.groupby("time_Utc"), key=key)]结果(idx_blocks):
[Int64Index([0, 1, 2, 14, 15, 16], dtype='int64'),
Int64Index([11, 12, 13], dtype='int64'),
Int64Index([6, 7, 8], dtype='int64'),
Int64Index([9, 10], dtype='int64'),
Int64Index([3, 4, 5], dtype='int64')]然后使用combinations选择三个索引块的所有组合(它们保持排序),使用product从它们中选择所有可能的索引三元组,并收集相应的子数据文件:
samples = [
df.loc[idx, :]
for blocks in combinations(idx_blocks, 3)
for idx in product(*blocks)
]您将得到如下示例数据的列表:
Source number_mp time_Utc
0 -23456 369452 2014-9-12
11 -23456 369452 2016-8-10
6 -13245 369452 2016-10-26
Source number_mp time_Utc
0 -23456 369452 2014-9-12
11 -23456 369452 2016-8-10
7 369452 2016-10-26
...
Source number_mp time_Utc
8 45678 369452 2016-10-26
10 369452 2016-11-3
4 56789 369452 2017-5-14
Source number_mp time_Utc
8 45678 369452 2016-10-26
10 369452 2016-11-3
5 -12456 369452 2017-5-14但是要注意,这些样本的数量可能很快就会失控。你提供的小样本数据已经导致351个样本。
https://stackoverflow.com/questions/72940772
复制相似问题