我试图从bigquery返回一个查询结果,该结果返回每个会话发生的事件数。我一直在参考下面这篇文章:
http://developer.streak.com/2013/11/using-google-bigquery-for-event-tracking.html
数据库模式是非常简单的sessionId,eventType,createdAt返回的结果集类似于谷歌分析中的事件工作流。像sessionId,num_event1,num_event2,..。
该方法是根据事件类型和时间戳生成子查询,然后创建连接每个事件子查询的结果的附加子查询。我能够单独执行Step1、step2、step3子查询:
SELECT COUNT(first_event_timestamp) AS number_first_events,
COUNT(second_event_timestamp) AS number_second_events,
COUNT(third_event_timestamp) AS number_third_events
FROM
(SELECT eventUid AS eventUid1,
createdAt AS timestamp1
FROM [events_table]
WHERE eventType = 'first-event') step1,
(SELECT eventUid AS eventUid2,
createdAt AS timestamp2
FROM [events_table]
WHERE eventType = 'second-event') step2,
(SELECT
eventUid as sessionId3,
createdAt as timestamp3
FROM
[events_table]
WHERE
eventType = "third_event") step3加上steps1_2,steps1_2_3是我碰壁的地方。我得到一个错误,即表中缺少数据集名称。下面是完整的查询:
SELECT COUNT(first_event_timestamp) AS num_first,
COUNT(second_event_timestamp) AS num_second,
COUNT(third_event_timestamp) AS num_third
FROM (SELECT
sessionId
first_event_timestamp,
second_event_timestamp,
third_event_timestamp
FROM steps1_2_3
GROUP BY sessionId),
(SELECT
sessionId AS sessionId1,
createdAt AS timestamp1
FROM
[events_table]
WHERE
eventType = "first_event") step1, (SELECT
eventUid AS sessionId2,
createdAt AS timestamp2
FROM
[events_table]
WHERE
eventType = "second_event") step2, (SELECT
eventUid AS sessionId3,
createdAt AS timestamp3
FROM
[events_table]
WHERE
eventType = "third_Event") step3, (SELECT sessionId1,
timestamp1,
IF(timestamp1 < timestamp2, timestamp2, NULL) AS timestamp2
FROM
(SELECT sessionId1,
timestamp1,
timestamp2
FROM step1
LEFT JOIN step2
ON sessionId1 = sessionId2) ) steps1_2, (SELECT sessionId1 as sessionId,
timestamp1 as first_event_timestamp,
timestamp2 as second_event_timestamp,
IF(timestamp2 < timestamp3, timestamp3, NULL) as third_event_timestamp
FROM
(SELECT sessionId2,
timestamp2,
timestamp3
FROM steps1_2
LEFT JOIN step3
ON sessionId1 = sessionId3)
) steps1_2_3理想的结果集应如下所示: sessionId num_first_event num_second_event num_third_event S1 1 null S2 2 3 null S3 4 5 6
我的第一个问题是,是否可以连接子查询steps1_2,steps1_2_3?
在bigquery中实现像工作流这样的事件的替代方法,而不是计算时间戳的数量?
非常感谢您提供的任何技巧或建议的文档。另外,感谢您的时间和考虑。
https://stackoverflow.com/questions/38315064
复制相似问题