我试图从表中获得重复的记录计数,但是对于特定的分区,数据是不可用的,因此hive只打印"OK“结果。是否可以用0或NULL这样的值来更改此结果。是的,已经尝试了与nvl,合并,案例选项仍然显示OK。目标是只检查重复计数,因此至少需要一个值。
select col1, col2, nvl(count(*),0) AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
from xyz
where data_dt='20170423'
group by col1,col2
having count(*) >1发布于 2017-04-23 08:43:26
它将在空数据集上不返回任何行,因为您使用的是group by和having筛选器。组,这就是它不返回任何行的原因。不带group和查询返回0:
select nvl(count(*),0) cnt, 'xyz' AS TABLE_NAME
from xyz
where data_dt='20170423'作为一种解决方案,当数据集为空时,您可以使用空行合并所有
select col1, col2, nvl(count(*),0) AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
from xyz
where data_dt='20170423'
group by col1,col2
having count(*) >1
UNION ALL --returns 1 row on empty dataset
select col1, col2, DUPLICATE_ROW_COUNT, TABLE_NAME
from (select null col1, null col2, null AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
)a --inner join will not return rows when non-empty dataset
inner join (
select count(*) cnt from --should will return 0 on empty dataset
( --your original query
select col1, col2, nvl(count(*),0) AS DUPLICATE_ROW_COUNT, 'xyz' AS TABLE_NAME
from xyz
where data_dt='20170423'
group by col1,col2
having count(*) >1
)s --your original query
)s on s.cnt=0 此外,您的子查询可能使用CTE (WITH)和WHERE NOT EXISTS而不是inner join,但没有对其进行测试。
此外,您还可以使用shell获取结果并在空值上测试它:
dataset=$(hive -e "set hive.cli.print.header=false; [YOUR QUERY HERE]);
# test on empty dataset
if [[ -z "$dataset" ]] ; then
dataset=0
fihttps://stackoverflow.com/questions/43568451
复制相似问题