首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >多到多连接(同一ID与不同日期)

多到多连接(同一ID与不同日期)
EN

Stack Overflow用户
提问于 2015-11-06 20:18:53
回答 3查看 97关注 0票数 0

我正在使用SQL和R进行分析,我想连接两个表,如下所示:

表1:

代码语言:javascript
复制
ID  date
a11 20150302
a11 20150302
a22 20150303
a22 20150304
a33 20150306
a44 20150306
a55 20150307
a66 20150308
a66 20150309
a66 20150310

表2

代码语言:javascript
复制
ID  date
a11 20150303
a22 20150304
a22 20150305
a44 20150306
a66 20150308
a66 20150310

情况是这样的:客户被呼叫(table1),客户回电话索取更多信息(表二)。

所以我在分析中想做的是:

  1. 只显示这两个表中的in。
  2. 表2日期与表1日期匹配日期:
    • 匹配最近的日期
    • 表2日期必须是>=表1日期(与结果"a66“20150310中的示例一样,table1日期为20150310,而20150308为20150308,而不是20150309)

结果:

代码语言:javascript
复制
ID  table1 date table2 date
a11 20150302    
a11 20150302    20150303
a22 20150303    20150304
a22 20150304    20150305
a44 20150306    20150306
a66 20150308    20150308
a66 20150309    
a66 20150310    20150310

对于这个多到多(但我不希望结果是n*m,我需要1到1)匹配/连接,有什么解决方案吗?解决方案,无论是在R或SQL将是需要的。

谢谢

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2015-11-06 20:38:18

代码语言:javascript
复制
SELECT ID, Date1, Date2 FROM (
SELECT joined.ID,  joined.Date1, joined.Date2, ROW_NUMBER() OVER (PARTITION BY ID, Date1 ORDER BY Date2 ASC) AS RowNumber 
FROM(
SELECT t1.ID, t1.[Date] as Date1, CASE WHEN t2.[Date] >= t1.[Date] THEN t2.[Date] ELSE NULL END as [Date2] 
FROM Table1 t1
LEFT JOIN Table2 t2 ON t1.ID = t2.ID) as joined 
WHERE joined.Date2 IS NOT NULL
) partitioned
WHERE RowNumber = 1

连接ID上的两个表,并删除不在Table 1中的Table 2中的行。然后使用ROW_NUMBER() OVER (PARTITION BY ID, Date1 ORDER BY Date2 ASC)来匹配由WHERE RowNumber = 1子句找到的最近的日期。

生成与所列条件一致的输出:

代码语言:javascript
复制
+-----+----------+----------+
| ID  |  Date1   |  Date2   |
+-----+----------+----------+
| a11 | 20150302 | 20150303 |
| a22 | 20150303 | 20150304 |
| a22 | 20150304 | 20150304 |
| a44 | 20150306 | 20150306 |
| a66 | 20150308 | 20150308 |
| a66 | 20150309 | 20150310 |
| a66 | 20150310 | 20150310 |
+-----+----------+----------+
票数 1
EN

Stack Overflow用户

发布于 2015-11-06 22:30:31

我在R和dplyr中得到的结果是一样的。对于a22,20150304初始呼叫的最近回调是20150304,而不是20150305。您需要一个时间组件来区分这一点。

代码语言:javascript
复制
library(dplyr)
inner_join(table1,table2,"ID")%>%
group_by(ID,date1)%>%
filter(date1<=date2)%>%
filter(row_number() == 1)

>
Source: local data frame [7 x 3]
Groups: ID, date1 [7]

     ID    date1    date2
  (chr)    (int)    (int)
1   a11 20150302 20150303
2   a22 20150303 20150304
3   a22 20150304 20150304
4   a44 20150306 20150306
5   a66 20150308 20150308
6   a66 20150309 20150310
7   a66 20150310 20150310

数据

代码语言:javascript
复制
table1 <-read.table(text="ID  date1
a11 20150302
a11 20150302
a22 20150303
a22 20150304
a33 20150306
a44 20150306
a55 20150307
a66 20150308
a66 20150309
a66 20150310", header=T,stringsAsFactors =F)
table2 <-read.table(text="ID  date2
a11 20150303
a22 20150304
a22 20150305
a44 20150306
a66 20150308
a66 20150310", header=T,stringsAsFactors =F)
票数 1
EN

Stack Overflow用户

发布于 2015-11-06 22:45:17

这并不能解决问题,但很接近,也许会给你一个想法

SqlFiddleDemo

代码语言:javascript
复制
With t_left as (
    SELECT *, row_number() over (partition by "ID" order by date desc ) as rn
    FROM Table1 T
    WHERE EXISTS (SELECT 1 FROM Table2 P WHERE T."ID" = P."ID")
),
t_right as (
    SELECT *, row_number() over (partition by "ID" order by date desc) as rn
    FROM Table2
) 
SELECT t_left."ID", t_left."date", t_right."date"
FROM t_left
LEFT JOIN t_right
       on t_left.rn = t_right.rn
      and t_left."ID" = t_right."ID"
ORDER BY t_left."ID", t_left."date"

输出

代码语言:javascript
复制
|  ID |     date |     date |
|-----|----------|----------|
| a11 | 20150302 | 20150303 |
| a11 | 20150302 |   (null) |
| a22 | 20150303 | 20150304 |
| a22 | 20150304 | 20150305 |
| a44 | 20150306 | 20150306 |
| a66 | 20150308 |   (null) |
| a66 | 20150309 | 20150308 |
| a66 | 20150310 | 20150310 |
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/33575076

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档