我有一个独特的数据集,根据这个值,我需要将前20%的记录从最下面的80%中分离出来。
因此,我创建了一个使用percentile_cont执行此操作的标志。
然后,我需要为前20%和下面80%的每一行创建随机row_number。
因此,每个行号列都从1开始,不计算不符合正确百分比队列的行数。
然后,我取了这些行号,并将它们按适当数量的大小相等的类别划分为前20%,前80%使用NTILE。
但是,由于某种原因,NTILE(8)没有创建8个组。对于前20%的人来说,这只是把每个人都扔到了7,8中。
同样的事情也发生在最底层的80%的人群身上。
这是密码。知道为什么会发生这种事吗?
SELECT
*
, CASE
WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 1) THEN 'Top 20% Group1'
WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 2) THEN 'Top 20% Group2'
WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 3) THEN 'Top 20% Group3'
WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 4) THEN 'Top 20% Group4'
WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 5) THEN 'Top 20% Group5'
WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 6) THEN 'Top 20% Group6'
WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 7) THEN 'Top 20% Group7'
WHEN (top_20_percentile_prospects_fl = 1 AND NTILE(8) OVER (ORDER BY top_20_random_row_number) = 8) THEN 'Top 20% Group8'
--Bottom 80%
WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 1) THEN 'Bottom 80% Group1'
WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 2) THEN 'Bottom 80% Group2'
WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 3) THEN 'Bottom 80% Group3'
WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 4) THEN 'Bottom 80% Group4'
WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 5) THEN 'Bottom 80% Group5'
WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 6) THEN 'Bottom 80% Group6'
WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 7) THEN 'Bottom 80% Group7'
WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number <= 113221 AND NTILE(8) OVER (ORDER BY bottom_80_random_row_number) = 8) THEN 'Bottom 80% Group8'
WHEN (top_20_percentile_prospects_fl = 0 AND bottom_80_random_row_number > 113221) THEN 'Holdout'
END as Treatments
FROM (
SELECT
*
, CASE WHEN top_20_percentile_prospects_fl = 1 THEN ROW_NUMBER() OVER (PARTITION BY CASE WHEN top_20_percentile_prospects_fl = 1 THEN 1 ELSE 0 END ORDER BY NEWID())
END AS top_20_random_row_number
, CASE WHEN top_20_percentile_prospects_fl = 0 THEN ROW_NUMBER() OVER (PARTITION BY CASE WHEN top_20_percentile_prospects_fl = 0 THEN 1 ELSE 0 END ORDER BY NEWID())
END AS bottom_80_random_row_number
FROM(
SELECT
c.id
, c.name
, score
, CASE WHEN score >= PERCENTILE_CONT(0.80) WITHIN GROUP (ORDER BY score) OVER () THEN 1 ELSE 0 END AS top_20_percentile_prospects_fl
FROM Database.dbo.table c
INNER JOIN #temp as l ON l.id = c.id
)base_code
) base_code2谢谢!
我注意到,当我将其添加到查询的底部时,前20%的NTILE正常工作。
WHERE top_20_percentile_prospects_fl = 1有人能解释一下为什么吗?我猜想它在某种程度上与行号有关。
发布于 2021-01-31 06:40:15
基本上,您的问题是正在对所有行计算NTILE函数。但是您已经通过top_20_percentile_prospects_fl预先计算了分区。因此,如果您希望通过不同的组对NTILE进行分区,则需要将NTILE按此划分。
所以对于前20名,你需要一个条款:
PARTITION BY top_20_percentile_prospects_fl。
但是对于倒数20的人来说,你需要:
PARTITON BY top_20_percentile_prospects_fl, CASE WHEN bottom_80_random_row_number <= 113221 THEN 0 ELSE 1 END
才能得到正确的结果。
老实说,通常不是这样进行窗口聚合的.
通常情况下,您首先使用计算过的列对行进行分区,然后对其进行NTILE分区。所以:
SELECT
*
, CASE WHEN top_20_percentile_prospects_fl = 1 THEN 'Top 20% '
ELSE 'Bottom 80% ' END AS Part
, CASE WHEN top_20_percentile_prospects_fl = 0 AND rn > 113221
THEN 'Holdout'
ELSE
CASE NTile
WHEN 1 THEN 'Group1'
WHEN 2 THEN 'Group2'
WHEN 3 THEN 'Group3'
WHEN 4 THEN 'Group4'
WHEN 5 THEN 'Group5'
WHEN 6 THEN 'Group6'
WHEN 7 THEN 'Group7'
WHEN 8 THEN 'Group8'
END
END AS GroupNum
FROM (
SELECT *,
NTILE(8) OVER (PARTITION BY
CASE
WHEN top_20_percentile_prospects_fl = 0 THEN 0
WHEN top_20_percentile_prospects_fl = 1 AND rn <= 113221 THEN 1
ELSE 2 END
ORDER BY top_20_random_row_number
) AS NTile
FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY top_20_percentile_prospects_fl ORDER BY NEWID()) AS random_row_number
FROM(
SELECT
c.id
, c.name
, score
, CASE WHEN score >= PERCENTILE_CONT(0.80) WITHIN GROUP (ORDER BY score) OVER () THEN 1 ELSE 0 END AS top_20_percentile_prospects_fl
FROM Database.dbo.table c
INNER JOIN #temp as l ON l.id = c.id
)base_codeCentile
) base_codeRn
) base_codeNTile这里最大的区别是,NTILE结果现在分散在多个行上,而不是放在同一行上(然后需要用CASE分隔)。
https://stackoverflow.com/questions/65976135
复制相似问题