文章/答案/技术大牛

发布

社区首页 >问答首页 >需要巧妙的循环-通过DateTime列进行排序并测量拥挤程度

问需要巧妙的循环-通过DateTime列进行排序并测量拥挤程度
EN

Stack Overflow用户

提问于 2019-03-26 18:43:54

回答 1查看 42关注 0票数 0

我想衡量每小时急诊室的人数。定义如下:人群(Hour=x)=没有出院的人(小时=x-1)+加入的人(hour=x)--出院的人(hour=x)

我使用熊猫的数据，登记的日期和时间以及卸货的日期和时间都写在“2013-01-01:41:01”。

创建这个人群(小时)数据的最简单和最优雅的方法是什么？我想只是写一个非常具体的for循环和一个计数函数，但我会很高兴地考虑您的洞察力，然后跳到这样的探索(：！

在很多情况下，出院的日期和时间是南，因为这些个案没有出院，而是转往某间医院的某个部门。

示例

假设我有这个数据集

case    RegisterDateTime    DischargeDateTime.   TransferDateTime
 0    '2013-01-01 00:12:00'    '2013-01-01 00:48:00'    NAN
 1    '2013-01-01 00:43:00'    '2013-01-01 02:12:00'    NAN
 2    '2013-01-01 00:56:00'    '2013-01-01 01:22:00'    NAN
 3    '2013-01-01 01:04:00'    '2013-01-01 04:12:00'    NAN
 4    '2013-01-01 01:34:00'    '2013-01-01 04:52:00'    NAN
 5    '2013-01-01 02:01:00'    NAN    '2013-01-01 05:34:00'

所以我想要一个数据集“人群”，它可以告诉我每天和每一个小时的人数是多少。在这个例子中，我们可以看到人群(‘2013-01-01’，0)=2(为什么？由于没有预先登记的病例，0,1,2例在0小时内登记，0例0+3-1=2)人群(‘2013-01-01’，1)=3(为什么？病例1,2预先登记，3,4例在1小时内登记，2例出院->2+2-1=3 )我希望现在的想法是明确的。

另外，关于排放和转移，它们是相辅相成的，所以我只需要弄清楚如何将它们连接在一起，并擦除NAN。

loops

dataframe

pandas

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-03-27 11:01:46

这里有一种方法。很长时间以来，您在文章中描述了这个想法，但这是一个漫长的系列步骤。也许，其他人可能有一个更短的实现。

import pandas as pd

>>>df
   case RegisterDateTime DischargeDateTime TransferDateTime
0     0      1/1/13 0:12       1/1/13 0:48              NaN
1     1      1/1/13 0:43       1/1/13 2:12              NaN
2     2      1/1/13 0:56       1/1/13 1:22              NaN
3     3      1/1/13 1:04       1/1/13 4:12              NaN
4     4      1/1/13 1:34       1/1/13 4:52              NaN
5     5      1/1/13 2:01               NaN      1/1/13 5:34

# Construct population outflow. This is where you merge Discharges with Transfers
df_out = pd.DataFrame([(j,k) if str(k) != 'nan' else (j,v) for j, k, v in zip(df['case'], df['DischargeDateTime'],df['TransferDateTime'])])
df_out.columns = ['out', 'time']
# You can skip this if your column is already in DateTime
df_out['time'] = pd.to_datetime(df_out['time'])
# Needed for resampling
df_out.set_index('time', inplace=True)
df_out = df_out.resample('H').count().cumsum()
# Needed for merging later
df_out.reset_index(inplace=True)

>>>df_out
                     out
time                    
2013-01-01 00:00:00    1
2013-01-01 01:00:00    2
2013-01-01 02:00:00    3
2013-01-01 03:00:00    3
2013-01-01 04:00:00    5
2013-01-01 05:00:00    6

# Now, repeat for the population inflow
df_in = df.loc[:, ['case', 'RegisterDateTime']]
df_in.columns = ['in', 'time']
df_in['time'] = pd.to_datetime(df_in['time'])
df_in.set_index('time', inplace=True)
df_in = df_in.resample('H').count().cumsum()
df_in.reset_index(inplace=True)

>>>df_in
                     in
time                   
2013-01-01 00:00:00   3
2013-01-01 01:00:00   5
2013-01-01 02:00:00   6


# You can now combine the two
df= pd.merge(df_in, df_out)
df['population'] = df['in'] - df['out']

>>>df
                 time  in  out  population
0 2013-01-01 00:00:00   3    1           2
1 2013-01-01 01:00:00   5    2           3
2 2013-01-01 02:00:00   6    3           3

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/55364231

复制

相似问题

问需要巧妙的循环-通过DateTime列进行排序并测量拥挤程度
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问需要巧妙的循环-通过DateTime列进行排序并测量拥挤程度EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问需要巧妙的循环-通过DateTime列进行排序并测量拥挤程度
EN