我有一个数据框,看起来像这样:
ID DATE
ABC 2018-02-07
ABC 2018-02-10
ABC 2018-02-13
ABC 2018-02-22
ABC 2018-02-26
ABC 2018-02-28
ABC 2018-04-06
ABC 2018-04-06
ABC 2018-04-12我正在尝试添加3个附加列:(1)所有记录的最早日期(2)日期和最早日期之间的时间(3)为记录返回第n次出现次数,为重复日期返回第n条记录的最大值。我期望以下内容作为输出:
PEL_ID TRANSACTIONDATEDIFF EARLIESTEXPOSURE TIMEDIFF NTH_FREQUENCY
ABC 2018-02-07 2018-02-07 0 1
ABC 2018-02-10 2018-02-07 3 2
ABC 2018-02-13 2018-02-07 6 3
ABC 2018-02-22 2018-02-07 15 4
ABC 2018-02-26 2018-02-07 19 5
ABC 2018-02-28 2018-02-07 21 6
ABC 2018-04-06 2018-02-07 58 8
ABC 2018-04-12 2018-02-07 64 9这是我的SQL代码:
SELECT
PEL_ID,TRANSACTIONDATEDIFF,EARLIESTEXPOSURE,TIME_DIFF,MAX(NTH_FREQUENCY)
FROM (
SELECT C.*,ROW_NUMBER() OVER(PARTITION BY PEL_ID ORDER BY PEL_ID) AS
NTH_FREQUENCY FROM
(SELECT A.PEL_ID,A.TRANSACTIONDATEDIFF,B.EARLIESTEXPOSURE,
(A.TRANSACTIONDATEDIFF-B.EARLIESTEXPOSURE) AS TIME_DIFF FROM
CAMP_31323_TODATE A JOIN (SELECT PEL_ID,MIN(TRANSACTIONDATEDIFF) AS
EARLIESTEXPOSURE FROM CAMP_31323_TODATE
GROUP BY PEL_ID) B ON A.PEL_ID=B.PEL_ID
ORDER BY A.PEL_ID) C
)
GROUP BY PEL_ID,TRANSACTIONDATEDIFF,EARLIESTEXPOSURE,TIME_DIFF
ORDER BY PEL_ID,TRANSACTIONDATEDIFF ASC;除了nth_frequency之外,大部分代码都可以正常工作,下面是输出:
PEL_ID TRANSACTIONDATEDIFF EARLIESTEXPOSURE TIMEDIFF NTH_FREQUENCY
ABC 2018-02-07 2018-02-07 0 3
ABC 2018-02-10 2018-02-07 3 6
ABC 2018-02-13 2018-02-07 6 8
ABC 2018-02-22 2018-02-07 15 2
ABC 2018-02-26 2018-02-07 19 7
ABC 2018-02-28 2018-02-07 21 1
ABC 2018-04-06 2018-02-07 58 5
ABC 2018-04-12 2018-02-07 64 9我不确定为什么会发生这种情况。我们将全心全意感谢您的帮助。提前谢谢。
发布于 2018-11-02 04:53:25
不是完整的解决方案,但可能是一个起点:
with t as (
select 'ABC' AS ID, DATE '2018-02-07' as D from dual union all
select 'ABC' AS ID, DATE '2018-02-10' as D from dual union all
select 'ABC' AS ID, DATE '2018-02-13' as D from dual union all
select 'ABC' AS ID, DATE '2018-02-22' as D from dual union all
select 'ABC' AS ID, DATE '2018-02-26' as D from dual union all
select 'ABC' AS ID, DATE '2018-02-28' as D from dual union all
select 'ABC' AS ID, DATE '2018-04-06' as D from dual union all
select 'ABC' AS ID, DATE '2018-04-06' as D from dual union all
select 'ABC' AS ID, DATE '2018-04-12' as D from dual),
g as
(select
ID,
D as TRANSACTIONDATEDIFF,
MIN(D) OVER (PARTITION BY ID ORDER BY D) as EARLIESTEXPOSURE,
D - MIN(D) OVER (PARTITION BY ID ORDER BY D) as TIMEDIFF,
RANK() OVER (PARTITION BY ID ORDER BY D) AS NTH_FREQUENCY
from t)
select distinct *
from g
order by 2;
+----------------------------------------------------------------+
|ID |TRANSACTIONDATEDIFF|EARLIESTEXPOSURE|TIMEDIFF|NTH_FREQUENCY|
+----------------------------------------------------------------+
|ABC |07-FEB-18 |07-FEB-18 |0 |1 |
|ABC |10-FEB-18 |07-FEB-18 |3 |2 |
|ABC |13-FEB-18 |07-FEB-18 |6 |3 |
|ABC |22-FEB-18 |07-FEB-18 |15 |4 |
|ABC |26-FEB-18 |07-FEB-18 |19 |5 |
|ABC |28-FEB-18 |07-FEB-18 |21 |6 |
|ABC |06-APR-18 |07-FEB-18 |58 |7 |
|ABC |12-APR-18 |07-FEB-18 |64 |9 |
+----------------------------------------------------------------+https://stackoverflow.com/questions/53108142
复制相似问题