我试图在导入到大查询中的属性中创建一个组的主视图,但是通过使用unnest(hits),SQL复制了数据,导致收入值不准确等等。
我试着去理解为什么不巢导致了这一切,但我还是搞不明白。
SELECT Date, hostname, channelGrouping, sum(transactionRevenue) as Revenue, sum(Shipping) as Shipping, sum(visits) as Sessions, sum(bounces) as Bounces, sum(transactions) as Transactions
FROM
(SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
UNION ALL
SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
UNION ALL
SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
FROM `102674002.ga_sessions_*`, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
UNION ALL
SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
UNION ALL
SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509')
Group By Date, hostname, channelGrouping
Order by Date发布于 2017-06-23 16:10:37
这可能会起作用:
SELECT
date,
channelGrouping,
SUM(Revenue) Revenue,
SUM(Shipping) Shipping,
SUM(bounces) bounces,
SUM(transactions) transactions,
hostname,
COUNT(date) sessions
FROM(
SELECT
date,
channelGrouping,
totals.totaltransactionrevenue / 1e6 Revenue,
ARRAY((SELECT DISTINCT page.hostname FROM UNNEST(hits) hits WHERE page.hostname IS NOT NULL)) hostnames,
(SELECT SUM(hits.transaction.transactionshipping) / 1e6 FROM UNNEST(hits) hits) Shipping,
totals.bounces bounces,
totals.transactions transactions
FROM `project_id.dataset_id.ga_sessions_*`
WHERE 1 = 1
AND ARRAY_LENGTH(ARRAY((SELECT DISTINCT page.hostname FROM UNNEST(hits) hits WHERE page.hostname IS NOT NULL))) > 0
AND _TABLE_SUFFIX BETWEEN '20170601' AND '20170609'
UNION ALL
(...)
),
UNNEST(hostnames) hostname
GROUP BY
date, channelGrouping, hostname注意,在这个查询中,我避免在UNNEST字段中应用hits操作,并且只在子选择中这样做。
为了理解为什么会出现这种情况,您必须了解ga数据是如何聚合到BigQuery中的。注意,我们基本上有两种类型的数据:session级数据和hits级数据。访问您的网站的每个客户端最终都会在BigQuery中生成一行,如下所示:
{fullvisitorid: 1, visitid:1, date: '20170601', channelGrouping: "search", hits: [{hitNumber: 1, page: {hostname: "yourserverhostname"}}, {hitNumber: 2, page: {hostname: "yourserverhostname"}}, totals: {totalTransactionRevenue:0, bounces: 0}]如果同一位顾客在一天后回来,它就会生成另一排到BQ中,如下所示:
{fullvisitorid: 1, visitid:2, date: '20170602', channelGrouping: "search", hits: [{hitNumber: 1, page: {hostname: "yourserverhostname"}}, {hitNumber: 2, page: {hostname: "yourserverhostname"}}, totals: {totalTransactionRevenue:50000000, bounces: 2}]正如您所看到的,键hits之外的字段与会话级别相关(因此,每次点击,即客户在网站中的每一次交互,都会在这里添加另一个条目)。应用UNNEST时,基本上是将数组中所有值的交叉连接应用到外部字段。
,这就是复制发生的地方!
考虑到过去的例子,如果我们将UNNEST应用于hits字段,您将得到如下结果:
fullvisitorid visitid totals.totalTransactionRevenue hits.hitNumber
1 1 0 1
1 1 0 2
1 2 50000000 1
1 2 50000000 2注意,对于hits字段中的每一次命中,都会导致外部字段(如totals.totalTransactionRevenue )对hits数组中发生的每个hitNumber重复。
因此,如果稍后应用一些类似于SUM(totals.totalTransactionRevenue)的操作,您最终会将该字段乘以客户在该visitid中的每一次命中。
我倾向于避免(取决于数据量) hits字段上的hits操作(代价高昂),而且我只在子查询中这样做(在子查询中,取消嵌套只在不重复数据的行级别进行)。
https://stackoverflow.com/questions/44723878
复制相似问题