首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Apache查询HiveQL

Apache查询HiveQL
EN

Stack Overflow用户
提问于 2014-08-23 19:31:35
回答 1查看 848关注 0票数 0

我正在学习Hive,并想编写一个优化的HiveQL/SQL查询

我的桌子是这样的:

代码语言:javascript
复制
CREATE TABLE sales (dealer VARCHAR(25), make VARCHAR(25), type VARCHAR(25), day INT);
INSERT INTO sales (dealer, make, type, day) VALUES
("Xyz", "Highlander", "SUV", "0"),
("Xyz", "Prius", "HATCH", "1"),
("Xyz", "Prius", "HATCH", "2"),
("Xyz", "Prius", "HATCH", "3"),
("Xyz", "Versa", "HATCH", "1"),
("Xyz", "Versa", "HATCH", "2"),
("Xyz", "Versa", "HATCH", "3"),
("Xyz", "S3", "SEDAN", "1"),
("Xyz", "S3", "SEDAN", "2"),
("Abc", "Forrester", "SUV", "1");

给定一个“交易商”D,我希望在过去的X天中,在一个查询中计算每个“类型”的顶N "make“。

代码语言:javascript
复制
SELECT dealer, make, type, COUNT(*) AS frequency FROM sales
WHERE day > 0 AND dealer LIKE 'Xyz' GROUP BY make, type
ORDER BY frequency DESC LIMIT 5

问题是,当使用GROUP BY on "make“和"type”表示前1时,我只会得到:

代码语言:javascript
复制
DEALER, MAKE, TYPE, COUNT
Xyz, Prius, Hatch, 3
Xyz, Versa, Hatch, 3
Xyz, S3, Sedan, 2
...

但我想

代码语言:javascript
复制
Xyz, Prius, Hatch, 3
Xyz, S3, Sedan, 2
...

对于每一个“类型”,顶部N。

有人能帮我理解如何写这样的查询吗?

SQL Fiddle http://sqlfiddle.com/#!2/df9304/5

*更新*

级别()似乎是有用的:

Hive getting top n records in group by query

15

HiveQL and rank()

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-08-26 02:29:53

在阅读了更多的文档和链接问题的提示之后:

代码语言:javascript
复制
SELECT dealer, make, rank, type FROM (
    SELECT dealer, make, rank() OVER (PARTITION BY type ORDER BY count DESC) AS rank, type FROM (
        SELECT dealer, make, count(*) AS count, type FROM Sales WHERE dealer = "Xyz" GROUP BY dealer, type, make
    ) CountedSales
) RankedSales
WHERE RankedSales.rank < 3;

内部查询执行计数,中间查询执行秩(),外部查询限制排序。

销售表目录

代码语言:javascript
复制
hive> select * from Sales;
OK
Xyz      Highlander      SUV    NULL
Xyz      Highlander      SUV    NULL
Xyz      Rouge   SUV    NULL
Xyz      Rouge   SUV    NULL
Xyz      Prius   HATCH  NULL
Xyz      Prius   HATCH  NULL
Xyz      Prius   HATCH  NULL
Xyz      Versa   HATCH  NULL
Xyz      S3      SEDAN  NULL
Xyz      S3      SEDAN  NULL
Xyz      S3      SEDAN  NULL
Xyz      A8      SEDAN  NULL
Xyz      A8      SEDAN  NULL
Xyz      A8      SEDAN  NULL
Xyz      A8      SEDAN  NULL
Time taken: 0.054 seconds, Fetched: 15 row(s)

现在是实际的查询。

代码语言:javascript
复制
hive> SELECT dealer, make, rank, type FROM (                                                                          
    >     SELECT dealer, make, rank() OVER (PARTITION BY type ORDER BY count DESC) AS rank, type FROM (
    >         SELECT dealer, make, count(*) AS count, type FROM Sales WHERE dealer = "Xyz" GROUP BY dealer, type, make
    >     ) CountedSales
    > ) RankedSales
    > WHERE RankedSales.rank < 3;
...
Execution completed successfully
MapredLocal task succeeded
OK
Xyz      Prius  1        HATCH
Xyz      Versa  2        HATCH
Xyz      A8     1        SEDAN
Xyz      S3     2        SEDAN
Xyz      Rouge  1        SUV
Xyz      Highlander     1        SUV
Time taken: 28.491 seconds, Fetched: 6 row(s)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/25465662

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档