上下文:I有一个表,其中包含RetailerCode、CustomerID、分段,如下所示
RetailerCode CID Segment
A6005 13SVC15 High
A6005 19VDE1F Low
A6005 1B3BD1F Medium
A6005 1B3HB48 Medium
A6005 1B3HB49 Low
A9006 1B3HB40 High
A9006 1B3HB41 High
A9006 1B3HB43 Low
A9006 1B3HB46 Medium在这里,我希望将数据集划分为控制和测试,如下所示,对于每个RetailerCode,我都有一组客户,每个客户都被标记为一个区段。我需要用这样的方式来划分
为每一家零售商
我尝试了下面的代码,我知道这是错误的。
select RetailerCode, CID,Segment
(case when row_number() over (order by newid()) <= (select 0.1* count(*) from Table)
then 'control'
else 'test'
end) as group
from Table
group by RetailerCode, CID,Segment
Order by RetailerCode有人能帮我一下吗?提前感谢
发布于 2018-12-30 16:19:42
你看起来很亲密:
select RetailerCode, CID,Segment
(case when row_number() over (partition by segment order by newid()) <=
0.1 * count(*) over (partition by segment)
then 'control'
else 'test'
end) as group
from Table
Order by RetailerCode;我不明白为什么需要一个group by。
发布于 2018-12-30 16:47:17
percent_rank是基于rank和count的
select RetailerCode, CID,Segment
(case when percent_rank() over (partition by segment order by newid()) <= 0.1
then 'control'
else 'test'
end) as group
from Table
Order by RetailerCodentile是基于row_number和count的
select RetailerCode, CID,Segment
(case when ntile(10) over (partition by segment order by newid()) = 1
then 'control'
else 'test'
end) as group
from Table
Order by RetailerCodehttps://stackoverflow.com/questions/53979261
复制相似问题