首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >按时间戳聚合

按时间戳聚合
EN

Stack Overflow用户
提问于 2014-05-19 15:44:04
回答 1查看 1.2K关注 0票数 3

SEO > SEO >付费1付费>代销商>付费1 SEO >代销商1I有一个查询,其中包含客户标识号、营销渠道、时间戳和购买日期。所以,结果可能是这样的。

代码语言:javascript
复制
id marketingChannel TimeStamp      Transaction_date
1  SEO              5/18 23:11:43  5/18
1  SEO              5/18 24:12:43  5/18
1  Paid             5/18 24:13:43  5/18
2  Paid             5/18 24:12:43  5/18
2  Paid             5/18 24:14:43  5/18
2  Affiliate        5/18 24:20:43  5/18
2  Paid             5/18 24:22:43  5/18
3  SEO              5/18 24:10:43  5/18
3  Affiliate        5/18 24:11:43  5/18

我想知道是否有一个查询,以一种显示营销路径计数的方式聚合这些信息。

例如。

代码语言:javascript
复制
Marketing Path                  Count
SEO > SEO > Paid                  1
Paid > Paid > Affiliate > Paid    1
SEO > Affiliate                   1

我正在考虑编写一个Python脚本来获取这些信息,但我想知道SQL中是否有一个简单的解决方案--因为我没有使用SQL的framiliar。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-05-20 21:28:46

几年前,我需要一个类似的结果,我测试了不同的方法,以获得一个连接字符串中的Teradata。顺便说一句,如果行数太高且连在一起的字符串超过64000个字符,则所有操作都可能失败。

最有效的是用户定义的函数(用C编写):

代码语言:javascript
复制
SELECT
   PATH
  ,COUNT(*)
FROM
 (
   SELECT 
      DelimitedBuildSorted(MARKETINGCHANNEL
                          ,CAST(CAST(ts AS FORMAT 'yyyymmddhhmiss') AS VARCHAR(14))
                          ,'>') AS PATH
   FROM t
   GROUP BY id
 ) AS dt
GROUP BY 1;

如果您需要频繁地运行该查询和/或在一个大型表上,如果可以使用UDF,您可以与您的DBA对话(大多数DBA不喜欢它们,因为它们是用他们不知道的语言编写的,C)。

如果每个id的平均行数较低,则递归可能是可以的。Joseph的版本可以稍微简化一些,但最重要的是创建一个临时表,而不是使用视图或派生表进行ROW_NUMBER计算。这将导致更好的计划(在Server中也是如此):

代码语言:javascript
复制
CREATE VOLATILE TABLE vt AS 
 (
   SELECT
      id
     ,MarketingChannel
     ,ROW_NUMBER() OVER (PARTITION BY id ORDER BY TS DESC) AS rn
     ,COUNT(*) OVER (PARTITION BY id) AS max_rn
   FROM t
 ) WITH DATA 
PRIMARY INDEX (id) 
ON COMMIT PRESERVE ROWS;

WITH RECURSIVE cte(id, path, rn) AS
 (
   SELECT 
      id, 

      -- modify VARCHAR size to fit your maximum number of rows, that's better than VARCHAR(64000)
      CAST(MarketingChannel AS VARCHAR(10000)) AS PATH, 
      rn
   FROM vt
   WHERE rn = max_rn
   UNION ALL
   SELECT 
      cte.ID, 
      cte.PATH || '>' || vt.MarketingChannel, 
      cte.rn-1
   FROM vt JOIN cte
     ON vt.id = cte.id
    AND vt.rn = cte.rn - 1
 )
SELECT 
   PATH, 
   COUNT(*) 
FROM cte
WHERE rn = 1
GROUP BY path
ORDER BY PATH
;

你也可以试试旧学校MAX(CASE):

代码语言:javascript
复制
SELECT
   PATH
  ,COUNT(*)
FROM
 (
   SELECT
      id
     ,MAX(CASE WHEN rnk =  0 THEN MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk =  1 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk =  2 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk =  3 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk =  4 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk =  5 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk =  6 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk =  7 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk =  8 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk =  9 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 10 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 11 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 12 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 13 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 14 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 15 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 16 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 17 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 18 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 19 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 20 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 21 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 22 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 23 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 24 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 25 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 26 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 27 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 28 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 29 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 30 THEN '>' || MarketingChannel ELSE '' END) ||
      MAX(CASE WHEN rnk = 31 THEN '>' || MarketingChannel ELSE '' END) AS PATH
   FROM
    (
     SELECT
        id
       ,TRIM(MarketingChannel) AS MarketingChannel
       ,RANK() OVER (PARTITION BY id
                     ORDER BY TS) -1 AS rnk
     FROM t
    ) dt
   GROUP BY 1
 ) AS dt
GROUP BY 1;

我有多达2048行,每行有30个字符:-)

代码语言:javascript
复制
SELECT
   PATH
  ,COUNT(*)
FROM
 (
   SELECT
      id
     ,MAX(CASE WHEN rnk MOD 16 = 0 THEN path ELSE '' END) ||
      MAX(CASE WHEN rnk MOD 16 = 1 THEN '>' || path ELSE '' END) ||
      MAX(CASE WHEN rnk MOD 16 = 2 THEN '>' || path ELSE '' END) ||
      MAX(CASE WHEN rnk MOD 16 = 3 THEN '>' || path ELSE '' END) ||
      MAX(CASE WHEN rnk MOD 16 = 4 THEN '>' || path ELSE '' END) ||
      MAX(CASE WHEN rnk MOD 16 = 5 THEN '>' || path ELSE '' END) ||
      MAX(CASE WHEN rnk MOD 16 = 6 THEN '>' || path ELSE '' END) ||
      MAX(CASE WHEN rnk MOD 16 = 7 THEN '>' || path ELSE '' END) AS PATH
   FROM
    (
     SELECT
        id
       ,rnk / 16 AS rnk
       ,MAX(CASE WHEN rnk MOD 16 =  0 THEN path ELSE '' END) ||
        MAX(CASE WHEN rnk MOD 16 =  1 THEN '>' || path ELSE '' END) ||
        MAX(CASE WHEN rnk MOD 16 =  2 THEN '>' || path ELSE '' END) ||
        MAX(CASE WHEN rnk MOD 16 =  3 THEN '>' || path ELSE '' END) ||
        MAX(CASE WHEN rnk MOD 16 =  4 THEN '>' || path ELSE '' END) ||
        MAX(CASE WHEN rnk MOD 16 =  5 THEN '>' || path ELSE '' END) ||
        MAX(CASE WHEN rnk MOD 16 =  6 THEN '>' || path ELSE '' END) ||
        MAX(CASE WHEN rnk MOD 16 =  7 THEN '>' || path ELSE '' END) ||
        MAX(CASE WHEN rnk MOD 16 =  8 THEN '>' || path ELSE '' END) ||
        MAX(CASE WHEN rnk MOD 16 =  9 THEN '>' || path ELSE '' END) ||
        MAX(CASE WHEN rnk MOD 16 = 10 THEN '>' || path ELSE '' END) ||
        MAX(CASE WHEN rnk MOD 16 = 11 THEN '>' || path ELSE '' END) ||
        MAX(CASE WHEN rnk MOD 16 = 12 THEN '>' || path ELSE '' END) ||
        MAX(CASE WHEN rnk MOD 16 = 13 THEN '>' || path ELSE '' END) ||
        MAX(CASE WHEN rnk MOD 16 = 14 THEN '>' || path ELSE '' END) ||
        MAX(CASE WHEN rnk MOD 16 = 15 THEN '>' || path ELSE '' END) AS path
     FROM
      (
       SELECT
          id
         ,rnk / 16 AS rnk
         ,MAX(CASE WHEN rnk MOD 16 =  0 THEN path ELSE '' END) ||
          MAX(CASE WHEN rnk MOD 16 =  1 THEN '>' || path ELSE '' END) ||
          MAX(CASE WHEN rnk MOD 16 =  2 THEN '>' || path ELSE '' END) ||
          MAX(CASE WHEN rnk MOD 16 =  3 THEN '>' || path ELSE '' END) ||
          MAX(CASE WHEN rnk MOD 16 =  4 THEN '>' || path ELSE '' END) ||
          MAX(CASE WHEN rnk MOD 16 =  5 THEN '>' || path ELSE '' END) ||
          MAX(CASE WHEN rnk MOD 16 =  6 THEN '>' || path ELSE '' END) ||
          MAX(CASE WHEN rnk MOD 16 =  7 THEN '>' || path ELSE '' END) ||
          MAX(CASE WHEN rnk MOD 16 =  8 THEN '>' || path ELSE '' END) ||
          MAX(CASE WHEN rnk MOD 16 =  9 THEN '>' || path ELSE '' END) ||
          MAX(CASE WHEN rnk MOD 16 = 10 THEN '>' || path ELSE '' END) ||
          MAX(CASE WHEN rnk MOD 16 = 11 THEN '>' || path ELSE '' END) ||
          MAX(CASE WHEN rnk MOD 16 = 12 THEN '>' || path ELSE '' END) ||
          MAX(CASE WHEN rnk MOD 16 = 13 THEN '>' || path ELSE '' END) ||
          MAX(CASE WHEN rnk MOD 16 = 14 THEN '>' || path ELSE '' END) ||
          MAX(CASE WHEN rnk MOD 16 = 15 THEN '>' || path ELSE '' END) AS path
       FROM
        (
         SELECT
            id
           ,TRIM(MarketingChannel) AS PATH
           ,RANK() OVER (PARTITION BY id
                         ORDER BY TS) -1 AS rnk
         FROM t
        ) dt
       GROUP BY 1,2
      ) dt
     GROUP BY 1,2
    ) dt
   GROUP BY 1
 ) dt
GROUP BY 1
票数 3
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/23741925

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档