我正在尝试为站点流量建立一个灵活的属性模型。首先,我做了一个最后点击模型,其中一个转换被归因于一个通道的流量基础上的最后一次访问的访问者在转换之前。所以:
Visitor Visit Channel ConversionId
visitor a visit 1 email 123
visitor a visit 2 email 123
visitor a visit 3 direct 123
visitor b visit 1 seo 123
visitor b visit 2 direct 123
visitor b visit 3 email 123
visitor c visit 1 seo 123
visitor c visit 2 direct 123
visitor c visit 3 direct 123在上面的例子中,我将统计每个访问者的最后一次访问,并将转换归因于该通道。因此,访问者a和访问者c的转换将归因于直接,而访问者b将归因于电子邮件。
现在我希望能够排除直接,从而能够将转换归因于最后一个非直接通道。在这种情况下,访问者a和访问者b的转换将归因于电子邮件(不包括访问者a的第三次访问),而访问者c的转换将归因于se (因为c的第二次和第三次访问将被排除在外)。
到目前为止,我的设置方式是(其中一些看起来有点傻,因为为了简单起见,我去掉了一些联接和标识信息):
WITH test (visitor,
visit, --a number unique for each visitor row but not necessarily unique across all visits
channel,
conversionid,
rn) AS
(
SELECT visitorid AS visitor,
visitid AS visit,
channel AS channel,
conversionid AS conversionid
rn = ROW_NUMBER() OVER (PARTITION BY conversionid ORDER BY visit DESC)
FROM db
GROUP BY visitorid,
visitid,
channel,
conversionid)
SELECT visitor,
MAX(visit) AS maxvisit,
channel,
conversionid
FROM test
WHERE rn = 1
GROUP BY visitor,
channel
ORDER BY visitor;这给了我最后一次点击属性。(对于我来说,有一种更简单的方法可以做到这一点,但这种方式看上去不那么灵活--我希望能够轻松地更改查询以排除通道。)我的问题是,我该怎么做才能排除直接,并将转换归因于最后一个非直接通道?我也希望能够排除额外的渠道,如果需要,我可以建立不同的属性模型。
多谢了,伙计们。
发布于 2016-11-23 22:50:10
现在还不清楚你想要什么,所以这里有一些统计例子,希望能让你更容易理解。本节针对的是条件聚合,它允许您混合和匹配,例如接受MAX VisitId,而不考虑关联的conversion,并获得不直接的最后一个通道。
DECLARE @Table AS TABLE (visitor CHAR(1), VisitId INT, ConversionId INT, Channel VARCHAR(15))
INSERT INTO @Table VALUES ('a',1, 11,'email'),('a',2, 12,'email'),('a',3, 13,'direct')
,('b',4, 14,'seo'),('b',5, 15,'direct'),('b',6, 16,'email'),('c',7, 17,'seo')
,('c',8, 18,'direct'),('c',9, 19,'direct')
;WITH cte AS (
SELECT
visitor
,VisitId
,Channel
,ConversionId
,DirectConversion = CASE WHEN Channel = 'direct' THEN ConversionId END
,ChannelRowNumber = ROW_NUMBER() OVER (PARTITION BY visitor
ORDER BY
CASE WHEN Channel = 'direct' THEN 1 ELSE 0 END
,ConversionId DESC)
FROM
@Table
)
SELECT
visitor
,MaxVisitId = MAX(VisitId)
,MaxVisitIdOfNonDirect = MAX(CASE WHEN DirectConversion IS NULL THEN VisitId END)
,NumOfVisits = COUNT(DISTINCT VisitId)
,Channel = MAX(CASE WHEN ChannelRowNumber = 1 THEN Channel END)
,NumOfConversions = COUNT(DISTINCT ConversionId)
,NumOfDirectConversions = COUNT(DISTINCT DirectConversion)
,NumOfNonDirectConversions = COUNT(DISTINCT ConversionId) - COUNT(DISTINCT DirectConversion)
FROM
cte
GROUP BY
visitor如果您只想排除所有的频道在一起,那么使用ROW_NUMBER和哪里的<>‘直接’应该为您发挥作用。
发布于 2016-11-23 22:37:51
我认为有一个更简单的方法:
SELECT db.*
FROM (SELECT db.*,
ROW_NUMBER() OVER (PARTITION BY conversionid
ORDER BY (CASE WHEN channel <> 'Direct' THEN 1 ELSE 2 END),
visit DESC
) as seqnum
FROM db
) db
WHERE seqnum = 1;这将返回每个conversionid的最新记录,如果有Direct的话,这些记录不是Direct。
发布于 2016-11-24 06:43:05
在Server中,以下代码值得一试
select visitor,channel,conversionid,
dense_rank() over(partition by visitor order by visit desc) as rn
into #visitor
from visitor
where channel <> 'Direct'
order by visitor,visit
select v1.visitor,count(v2.visit) as visits ,v1.channel,v1.conversionid
from #visitor v1 join visitor v2
on v1.visitor=v2.visitor and v1.rn=1
group by v1.visitor,v1.channel,v1.conversionid https://stackoverflow.com/questions/40775065
复制相似问题