首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Bigquery取消点击-复制值)

Bigquery取消点击-复制值)
EN

Stack Overflow用户
提问于 2017-06-23 14:28:33
回答 1查看 5.1K关注 0票数 1

我试图在导入到大查询中的属性中创建一个组的主视图,但是通过使用unnest(hits),SQL复制了数据,导致收入值不准确等等。

我试着去理解为什么不巢导致了这一切,但我还是搞不明白。

代码语言:javascript
复制
SELECT Date, hostname, channelGrouping, sum(transactionRevenue) as Revenue, sum(Shipping) as Shipping, sum(visits) as Sessions, sum(bounces) as Bounces, sum(transactions) as Transactions
    FROM
        (SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
        FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
        WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
        UNION ALL
        SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
        FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
        WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
        UNION ALL
        SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
        FROM `102674002.ga_sessions_*`, UNNEST(hits) AS h
        WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
        UNION ALL
        SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
        FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
        WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509'
        UNION ALL
        SELECT Date, h.page.hostname as hostname, channelGrouping, totals.transactionRevenue, totals.visits, h.transaction.transactionShipping as shipping, totals.bounces, totals.transactions
        FROM `xxxxxxxxx.ga_sessions_*`, UNNEST(hits) AS h
        WHERE _TABLE_SUFFIX BETWEEN '20170401' AND '20170509')
    Group By Date, hostname, channelGrouping
    Order by Date
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-06-23 16:10:37

这可能会起作用:

代码语言:javascript
复制
SELECT
  date,
  channelGrouping,
  SUM(Revenue) Revenue,
  SUM(Shipping) Shipping,
  SUM(bounces) bounces,
  SUM(transactions) transactions,
  hostname,
  COUNT(date) sessions
FROM(
  SELECT 
    date,
    channelGrouping,
    totals.totaltransactionrevenue / 1e6 Revenue,
    ARRAY((SELECT DISTINCT page.hostname FROM UNNEST(hits) hits WHERE page.hostname IS NOT NULL)) hostnames,
    (SELECT SUM(hits.transaction.transactionshipping) / 1e6 FROM UNNEST(hits) hits) Shipping,
    totals.bounces bounces,
    totals.transactions transactions
  FROM `project_id.dataset_id.ga_sessions_*`
  WHERE 1 = 1
  AND ARRAY_LENGTH(ARRAY((SELECT DISTINCT page.hostname FROM UNNEST(hits) hits WHERE page.hostname IS NOT NULL))) > 0
  AND _TABLE_SUFFIX BETWEEN '20170601' AND '20170609'

  UNION ALL
  (...)

),
UNNEST(hostnames) hostname
GROUP BY
  date, channelGrouping, hostname

注意,在这个查询中,我避免在UNNEST字段中应用hits操作,并且只在子选择中这样做。

为了理解为什么会出现这种情况,您必须了解ga数据是如何聚合到BigQuery中的。注意,我们基本上有两种类型的数据:session级数据和hits级数据。访问您的网站的每个客户端最终都会在BigQuery中生成一行,如下所示:

代码语言:javascript
复制
{fullvisitorid: 1, visitid:1, date: '20170601', channelGrouping: "search", hits: [{hitNumber: 1, page: {hostname: "yourserverhostname"}}, {hitNumber: 2, page: {hostname: "yourserverhostname"}}, totals: {totalTransactionRevenue:0, bounces: 0}]

如果同一位顾客在一天后回来,它就会生成另一排到BQ中,如下所示:

代码语言:javascript
复制
{fullvisitorid: 1, visitid:2, date: '20170602', channelGrouping: "search", hits: [{hitNumber: 1, page: {hostname: "yourserverhostname"}}, {hitNumber: 2, page: {hostname: "yourserverhostname"}}, totals: {totalTransactionRevenue:50000000, bounces: 2}]

正如您所看到的,键hits之外的字段与会话级别相关(因此,每次点击,即客户在网站中的每一次交互,都会在这里添加另一个条目)。应用UNNEST时,基本上是将数组中所有值的交叉连接应用到外部字段。

,这就是复制发生的地方!

考虑到过去的例子,如果我们将UNNEST应用于hits字段,您将得到如下结果:

代码语言:javascript
复制
fullvisitorid    visitid    totals.totalTransactionRevenue    hits.hitNumber
1                1          0                                 1
1                1          0                                 2
1                2          50000000                          1
1                2          50000000                          2

注意,对于hits字段中的每一次命中,都会导致外部字段(如totals.totalTransactionRevenue )对hits数组中发生的每个hitNumber重复。

因此,如果稍后应用一些类似于SUM(totals.totalTransactionRevenue)的操作,您最终会将该字段乘以客户在该visitid中的每一次命中。

我倾向于避免(取决于数据量) hits字段上的hits操作(代价高昂),而且我只在子查询中这样做(在子查询中,取消嵌套只在不重复数据的行级别进行)。

票数 4
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/44723878

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档