首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >NTILE(8)函数不创建8个组

NTILE(8)函数不创建8个组
EN

Stack Overflow用户
提问于 2021-01-31 05:44:54
回答 1查看 315关注 0票数 0

我有一个独特的数据集,根据这个值,我需要将前20%的记录从最下面的80%中分离出来。

因此,我创建了一个使用percentile_cont执行此操作的标志。

然后,我需要为前20%和下面80%的每一行创建随机row_number

因此,每个行号列都从1开始,不计算不符合正确百分比队列的行数。

然后,我取了这些行号,并将它们按适当数量的大小相等的类别划分为前20%,前80%使用NTILE。

但是,由于某种原因,NTILE(8)没有创建8个组。对于前20%的人来说,这只是把每个人都扔到了7,8中。

同样的事情也发生在最底层的80%的人群身上。

这是密码。知道为什么会发生这种事吗?

代码语言:javascript
复制
SELECT
    *
    , CASE 
        WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 1) THEN 'Top 20% Group1' 
        WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 2) THEN 'Top 20% Group2'
        WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 3) THEN 'Top 20% Group3'
        WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 4) THEN 'Top 20% Group4'
        WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 5) THEN 'Top 20% Group5' 
        WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 6) THEN 'Top 20% Group6'
        WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 7) THEN 'Top 20% Group7'
        WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 8) THEN 'Top 20% Group8'
        --Bottom 80%
        WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 1) THEN 'Bottom 80% Group1'
        WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 2) THEN 'Bottom 80% Group2'
        WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 3) THEN 'Bottom 80% Group3'
        WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 4) THEN 'Bottom 80% Group4'
        WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 5) THEN 'Bottom 80% Group5'
        WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 6) THEN 'Bottom 80% Group6'
        WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 7) THEN 'Bottom 80% Group7'
        WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 8) THEN 'Bottom 80% Group8'
        WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number > 113221) THEN 'Holdout'       
      END as Treatments
FROM (
    SELECT 
        *
        , CASE WHEN top_20_percentile_prospects_fl = 1 THEN ROW_NUMBER() OVER (PARTITION BY CASE WHEN top_20_percentile_prospects_fl = 1 THEN 1 ELSE 0 END ORDER BY NEWID()) 
                END AS top_20_random_row_number
        , CASE WHEN top_20_percentile_prospects_fl = 0 THEN ROW_NUMBER() OVER (PARTITION BY CASE WHEN top_20_percentile_prospects_fl = 0 THEN 1 ELSE 0 END ORDER BY NEWID())
                END AS bottom_80_random_row_number 
    FROM(
        SELECT
            c.id
            , c.name
            , score
            , CASE WHEN score >= PERCENTILE_CONT(0.80) WITHIN GROUP (ORDER BY score) OVER () THEN 1 ELSE 0 END AS top_20_percentile_prospects_fl
        FROM Database.dbo.table c
        INNER JOIN #temp as l ON l.id = c.id        
    )base_code
) base_code2

谢谢!

我注意到,当我将其添加到查询的底部时,前20%的NTILE正常工作。

代码语言:javascript
复制
WHERE top_20_percentile_prospects_fl = 1

有人能解释一下为什么吗?我猜想它在某种程度上与行号有关。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-01-31 06:40:15

基本上,您的问题是正在对所有行计算NTILE函数。但是您已经通过top_20_percentile_prospects_fl预先计算了分区。因此,如果您希望通过不同的组对NTILE进行分区,则需要将NTILE按此划分。

所以对于前20名,你需要一个条款:

PARTITION BY top_20_percentile_prospects_fl

但是对于倒数20的人来说,你需要:

PARTITON BY top_20_percentile_prospects_fl, CASE WHEN bottom_80_random_row_number <= 113221 THEN 0 ELSE 1 END

才能得到正确的结果。

老实说,通常不是这样进行窗口聚合的.

通常情况下,您首先使用计算过的列对行进行分区,然后对其进行NTILE分区。所以:

代码语言:javascript
复制
SELECT
    *
    ,  CASE WHEN top_20_percentile_prospects_fl = 1 THEN 'Top 20% '
       ELSE 'Bottom 80% ' END AS Part
    ,  CASE WHEN top_20_percentile_prospects_fl = 0 AND rn > 113221
           THEN 'Holdout'       
       ELSE
           CASE NTile
           WHEN 1 THEN 'Group1'
           WHEN 2 THEN 'Group2'
           WHEN 3 THEN 'Group3'
           WHEN 4 THEN 'Group4'
           WHEN 5 THEN 'Group5'
           WHEN 6 THEN 'Group6'
           WHEN 7 THEN 'Group7'
           WHEN 8 THEN 'Group8'
           END
       END AS GroupNum
FROM (
  SELECT *,
    NTILE(8) OVER (PARTITION BY
        CASE 
          WHEN top_20_percentile_prospects_fl = 0 THEN 0
          WHEN top_20_percentile_prospects_fl = 1 AND rn <= 113221 THEN 1
          ELSE 2 END
        ORDER BY top_20_random_row_number
      ) AS NTile
   FROM
   (
    SELECT 
        *,
        ROW_NUMBER() OVER (PARTITION BY top_20_percentile_prospects_fl ORDER BY NEWID()) AS random_row_number
    FROM(
        SELECT
            c.id
            , c.name
            , score
            , CASE WHEN score >= PERCENTILE_CONT(0.80) WITHIN GROUP (ORDER BY score) OVER () THEN 1 ELSE 0 END AS top_20_percentile_prospects_fl
        FROM Database.dbo.table c
        INNER JOIN #temp as l ON l.id = c.id        
    )base_codeCentile
  ) base_codeRn
) base_codeNTile

这里最大的区别是,NTILE结果现在分散在多个行上,而不是放在同一行上(然后需要用CASE分隔)。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65976135

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档