我在Spark SQL中有一个查询,比如
select count(ts), truncToHour(ts)
from myTable
group by truncToHour(ts).其中ts是timestamp类型,truncToHour是将timestamp截断为小时的UDF.此查询不起作用。如果我试过,
select count(ts), ts from myTable group by truncToHour(ts)我得到了expression 'ts' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() if you don't care which value you get.;,但如果我这样做了,就不会定义first():
select count(ts), first(ts) from myTable group by truncToHour(ts)如何在不使用子查询的情况下获得我想要的东西?另外,为什么它说"wrap in first()“而没有定义first()呢?
发布于 2015-07-10 06:33:20
我有一个解决方案:
SELECT max(truncHour(ts)), COUNT(ts) FROM myTable GROUP BY truncHour(ts)或
SELECT truncHour(max(ts)), count(ts) FROM myTable GROUP BY truncHour(ts)有没有更好的解决方案?
发布于 2015-09-08 19:43:19
https://issues.apache.org/jira/browse/SPARK-9210
看起来实际的函数是first_value。
https://stackoverflow.com/questions/31329297
复制相似问题