文章/答案/技术大牛

发布

社区首页 >问答首页 >基于dataframe生成3种不同列

问基于dataframe生成3种不同列
EN

Stack Overflow用户

提问于 2019-11-11 06:43:59

回答 1查看 43关注 0票数 1

我有个数据：

Date_1      Date_2  individual_count
01/09/2019  02/08/2019  2
01/09/2019  03/08/2019  2
01/09/2019  04/08/2019  2
01/09/2019  05/08/2019  2
.   .   .
01/09/2019  28/08/2019  10
01/09/2019  29/08/2019  11
01/09/2019  30/08/2019  12
01/09/2019  31/08/2019  14

我想生成3列，num_days_2，num_days_3，num_days_5，num_days_20

我希望以这样的方式聚合数据集：

num_days_2 : all individual_count aggregated for date_1 for date_2 = (date_2- 2, date_2- 1)
num_days_3 : all individual_count aggregated for date_1 for date_2 = (date_2- 5, date_2- 3)
num_days_5 : all individual_count aggregated for date_1 for date_2 = (date_2- 6, date_2- 10)
num_days_20 : all individual_count aggregated for date_1 for date_2 = left all dates

例如， date_1 = 01/09/2019 :

num_days_2 = sum of individual counts for date_2 = 30/08/2019 - 31/08/2019 
num_days_3 = sum of individual counts for date_2 = 27/08/2019 - 29/08/2019 
num_days_5 = sum of individual counts for date_2 = 26/08/2019 - 22/08/2019 
num_days_20 = sum of individual counts for date_2 = 25/08/2019 - 02/08/2019

编辑

预期产出：

Date_1      num_days_2  num_days_3  num_days_5  num_days_20
01/09/2019   
02/09/2019
.
.
.
30/09/2019

谁能在实现同样的目标。

python

pandas

numpy

回答 1

Stack Overflow用户

发布于 2019-11-11 07:32:42

我已经创建了一个你可以使用的例子。您可能需要重命名这些列，并查看cut函数以使回收箱正确排序。

# Generate example data.
# This is just an way go generate data that can be used to simulate your data.
df = pd.DataFrame(
    data=dict(
      Date_1=pd.Timestamp('today'), # This is Date_1
      Date_2=pd.date_range(end=pd.Timestamp('today'), periods=25), # This is Date_2
      individual_count=range(25) # This is individual_count
    )
)

# Calculate an offset as integer days:
# For each day, calculate the differace in days between day Date1 and Date2
df['offset_timedelta'] = (df.Date_1 - df.Date_2)
# To make bining eaiser convert the datetime delta to ints.
df['offset'] = df['offset_timedelta'].dt.days.astype('int16')


# Create bins for each offset:
# Each row will be grouped into an interval. based on the list [1,2,5,10,1000]
# 1000 is just an upper bound to get "the rest"
df['bins'] = pd.cut(df['offset'], [1,2,5,10,1000], include_lowest=True)

# This groups on day1 and the bin, so that we can sum for each.
grouped = df.groupby(['Date_1','bins'])[['individual_count']].sum()

# The groupby gives and index of 'Date_1','bins'. This converts bins to columns instead of and index.
final = grouped.unstack()

编辑:重命名的列，使它们更像原来的问题。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58796668

复制

相似问题

问基于dataframe生成3种不同列
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问基于dataframe生成3种不同列EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问基于dataframe生成3种不同列
EN