首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Hive SQL完全外接与Where子句

Hive SQL完全外接与Where子句
EN

Stack Overflow用户
提问于 2020-02-25 17:15:31
回答 3查看 750关注 0票数 2

我正在创建一个带where子句的完整外部联接。但是,它只能生成内部连接结果。我怀疑这是由于where条款,但我确实需要添加条件。那么,如何创建满足这两个需求( where条件和完全外部连接)的查询?这是我的问题。

代码语言:javascript
复制
select
  t1.key1 as key1_1
, t1.key2 as key2_1
, t1.key3 as key3_1
, t1.date as date_1
, t1.v1
, t2.key1 as key1_2
, t2.key2 as key2_2
, t2.key3 as key3_2
, t2.date as date_2
, t2.v2
from t1 
full outer join t2
on t1.key1 = t2.key1 and t1.key2 = t2.key2 and t1.key3 = t2.key3 
where datediff(t1.date, t2.date) between -5 and 5
; 

样本数据

代码语言:javascript
复制
t1
key1 key2 key3 date        v1
A1   B1   C1   2015-01-01  10
A1   B2   C2   2015-01-01  11

t2
key1 key2 key3 date        v2
A1   B1   C1   2015-01-01  20
A1   B1   C1   2015-01-03  30
A1   B1   C1   2015-02-01  40
A1   B1   C1               50
A1   B1   C2   2015-01-02  60

期望结果

代码语言:javascript
复制
key1_1 key2_1 key3_1 date_1     v1 key1_2 key2_2 key3_2 date_2     v2
A1     B1     C1     2015-01-01 10 A1     B1     C1     2015-01-01 20
A1     B1     C1     2015-01-01 10 A1     B1     C1     2015-01-03 30
                                   A1     B1     C1     2015-02-01 40
                                   A1     B1     C1                50
                                   A1     B1     C2     2015-01-02 60
A1     B2     C2     2015-01-01 11

这些是我现在能想到的所有场景。如果我发现任何缺失的场景,我可以加进去。我在此要指出的是,应包括以下结果:

如果这两个表满足所有使用键和日期设置的条件,那么它将如所需结果中的第1行和第2行所示包括在内。如果没有满足这些条件中的任何一个条件,那么我们将在结果中保留一个表的信息,如第3、4、5和6行所示。

编辑:根据@Gordon的建议,我使用了一个联合公司来解决这个问题。请在下面的回复中看到我的解决方案。

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2020-02-26 00:20:38

这是我自己的问题的解决方案,基于@Gordon在讨论中提出的建议。

代码语言:javascript
复制
create table t3 as
select *, row_number () over () as id from t1;

create table t4 as
select *, row_number () over () as id from t2;

create table t5 as
select 
  t1.id as id_1
, t1.key1 as key1_1
, t1.key2 as key2_1
, t1.key3 as key3_1
, t1.date as date_1
, t1.v1
, t2.id as id_2
, t2.key1 as key1_2
, t2.key2 as key2_2
, t2.key3 as key3_2
, t2.date as date_2
, t2.v2
from t3 as t1 
full outer join t4 as t2
on t1.key1 = t2.key1 and t1.key2 = t2.key2 and t1.key3 = t2.key3 
where datediff(t1.date, t2.date) between -5 and 5
;

set hive.mapred.mode=nonstrict;
create table t6 as
select
  t1.id as id_1
, t1.key1 as key1_1
, t1.key2 as key2_1
, t1.key3 as key3_1
, t1.date as date_1
, t1.v1
, null as id_2
, null as key1_2
, null as key2_2
, null as key3_2
, null as date_2
, null as v2
from t3 as t1 
where t1.id not in (select t2.id_1 from t5 as t2 where t2.id_1 is not null)
;

create table t7 as
select
  null as id_1
, null as key1_1
, null as key2_1
, null as key3_1
, null as date_1
, null as v1
, t1.id as id_2
, t1.key1 key1_2
, t1.key2 key2_2
, t1.key3 key3_2
, t1.date date_2
, t1.v2
from t4 as t1 
where t1.id not in (select t2.id_2 from t5 as t2 where t2.id_2 is not null)
;

create table t8 as
select * from t5 union all
select * from t6 union all
select * from t7
;
票数 0
EN

Stack Overflow用户

发布于 2020-02-25 17:16:41

您可能只想将逻辑移到on子句中:

代码语言:javascript
复制
from t1 full outer join
     t2
     on t1.key1 = t2.key1 and
        t1.key2 = t2.key2 and
        t1.key3 = t2.key3 and
        datediff(t1.date, t2.date) between -5 and 5

编辑:

如果上面的内容不起作用,那么也许您可以将查询重写为union all

代码语言:javascript
复制
select . . . 
from t1 join
     t2
     on t1.key1 = t2.key1 and
        t1.key2 = t2.key2 and
        t1.key3 = t2.key3
where datediff(t1.date, t2.date) between -5 and 5
union all
select . . .
from t1
where not exists (select 1
                  from t2
                  where t1.key1 = t2.key1 and
                        t1.key2 = t2.key2 and
                        t1.key3 = t2.key3 and
                        datediff(t1.date, t2.date) between -5 and 5
                 )
union all
select . . .
from t2
where not exists (select 1
                  from t1
                  where t1.key1 = t2.key1 and
                        t1.key2 = t2.key2 and
                        t1.key3 = t2.key3 and
                        datediff(t1.date, t2.date) between -5 and 5
                 );

我也不确定Hive是否会接受这些相关条款。

票数 1
EN

Stack Overflow用户

发布于 2020-02-25 17:29:25

正如您已经意识到的,问题是where强制t1.datet2.date存在。你只需要避免这种假设,比如:

代码语言:javascript
复制
(t1.date is null) or (t2.date is null) or (datediff(t1.date, t2.date) between -5 and 5)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/60399994

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档