在BigQuery中,有基于的查询,如下所述,这是正确的。
#standard sql
SELECT
Date,
SUM(totals.visits) AS Sessions,
SUM(totals.transactions) AS Transactions
FROM
`[projectID].[DatasetID].ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20181217'
AND '20181217'
AND totals.visits > 0
GROUP BY
Date在这个查询中,我需要排除在hit中的所有点击.
注意:并不是,意思是在会话级别(如这张截图)上应用上述三个条件,而是在命中级别,因为我希望从另一个GA视图中复制数字,而不是将数据加载到BigQuery的视图中。在这个另一个GA视图中,上面所述的3个条件被设置为视图过滤器。
到目前为止,“最好的”查询是下面的查询(基于下面的帖子)。但是,此查询中没有过滤数据集(换句话说,条件不起作用)。
SELECT Date,
-- hits,
SUM(totals.transactions),
SUM(totals.visits)
FROM (
(
SELECT date, totals,
-- create own hits array
ARRAY(
SELECT AS STRUCT
hitnumber,
page,
-- create own product array
ARRAY(
SELECT AS STRUCT productSku, productQuantity
FROM h.product AS p
WHERE (SELECT COUNT(1)=0 FROM p.customDimensions WHERE index=6 AND value like '63%')
) AS product
FROM t.hits as h
WHERE
NOT REGEXP_CONTAINS(page.pagePath,r'gebak|cake')
AND
(SELECT COUNT(1)=0 FROM h.customDimensions WHERE index=23 AND value like '%editor%')
) AS hits
FROM
`[projectID].[DatasetID].ga_sessions_*` t
WHERE
_TABLE_SUFFIX BETWEEN '20181217'
AND '20181217'
AND totals.visits > 0
))
GROUP BY Date有人知道如何实现预期的输出吗?
提前谢谢!
注意:由于隐私问题,projectID和datasetID在这两个查询中都被屏蔽了。
发布于 2019-01-02 14:53:54
自己的数组方法
您可以通过对原始查询使用子查询并将其输出反馈到数组函数中来创建自己的命中和产品数组。在这些子查询中,您可以筛选出您的点击量和产品:
#standardsql
SELECT
date,
hits
--SUM(totals.visits) AS Sessions,
--SUM(totals.transactions) AS Transactions
FROM
(
SELECT
date, totals,
-- create own hits array
ARRAY(
SELECT AS STRUCT
hitnumber,
page,
-- create own product array
ARRAY(
SELECT AS STRUCT productSku, productQuantity
FROM h.product AS p
WHERE (SELECT COUNT(1)=0 FROM p.customDimensions WHERE index=6 AND value like '63%')
) AS product
FROM t.hits as h
WHERE
NOT REGEXP_CONTAINS(page.pagePath,r'gebak|cake')
AND
(SELECT COUNT(1)=0 FROM h.customDimensions WHERE index=23 AND value like '%editor%')
) AS hits
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20161104` t
)
--GROUP BY 1
LIMIT 100我将此示例保留在未分组状态,但是您可以通过注释掉hits和group来轻松地调整它.
分割方法
我认为您只需要在WHERE语句中使用正确的子查询:
#standardsql
SELECT
date,
SUM(totals.visits) AS Sessions,
SUM(totals.transactions) AS Transactions
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*` t
WHERE
(SELECT COUNT(1)=0 FROM t.hits h
WHERE
(SELECT count(1)>0 FROM h.customDimensions WHERE index=23 AND value like '%editor%')
OR
(SELECT count(1)>0 from h.product p, p.customdimensions cd WHERE index=6 AND value like '63%')
OR
REGEXP_CONTAINS(page.pagePath,r'gebak|cake')
)
GROUP BY date因为您的所有组都在会话级别,所以不需要任何扁平化(resp )。在主表上交叉连接数组),这是很昂贵的。在最外层的WHERE中,输入带有子查询(类似于for--每一行)的hits数组。在这里,您已经可以计算REGEXP_CONTAINS(page.pagePath,r'gebak|cake')的各种场合了。
对于其他情况,再次编写子查询以输入相应的数组--在第一种情况下,是customDimensions在hits中。这就像一个嵌套的for-在另一个内部(子查询中的子查询)。
在第二种情况下,我只是在简化--但只在子查询中:product和它的customDimensions。因此,这是一个一次性嵌套为-每个,因为我是懒惰和交叉连接。我可以编写另一个Subquery,而不是交叉连接,因此基本上是一个三层嵌套的-each(子查询中的子查询在子查询中)。
由于我正在计算想要排除的情况,所以我的外部条件是COUNT(1)=0。
我只能用ga样本数据测试它。所以这是一种未经检验的。但我想你知道这个主意了。
发布于 2018-12-23 16:58:20
关于如何在公共集合上使用WITH和REGEXP_EXTRACT的简单示例/想法
WITH CD6 AS (
SELECT cd.value, SUM(totals.visits) AS Sessions6Sum
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) AS hits,
UNNEST(hits.product) AS prod,
UNNEST(prod.customDimensions) AS cd
WHERE cd.index=6
AND NOT REGEXP_CONTAINS(cd.value,r'^63.....$')
GROUP BY cd.value
),
CD23 AS (
SELECT cd.value, SUM(totals.visits) AS Sessions23Sum
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) AS hits,
UNNEST(hits.product) AS prod,
UNNEST(prod.customDimensions) AS cd
WHERE cd.index=23
AND NOT REGEXP_CONTAINS(cd.value,r'editor')
GROUP BY cd.value
)
select CD6.Sessions6Sum + CD23.Sessions23Sum from CD6, CD23https://stackoverflow.com/questions/53904729
复制相似问题