我想知道BigQuery中的递归UDF函数是否是我正在做的事情的正确解决方案。但是首先,可以在UDF内部运行查询吗?
我在这里看到了一个类似的问题:BigQuery :在UDF中可以执行另一个查询吗?,但是解决方案似乎是一个可以直接执行SQL的解决方案。在我的例子中,我可能不得不反复/递归地调用UDF,而事先不知道步骤的数量(比如3-7个步骤)。
这是一个简单的用例,在表中的用户名条目上构建关系图,其中X度分离,其中X将由最终用户作为参数提供。我的猜测是递归样式的UDF会工作得很好,但它有可能吗?
*编辑:关于用例的更详细信息:**
考虑一个包含事务数据的表,该表包含每一行中的对应方,以及一些其他信息:
Buyer, Seller
Bob->Alice
Bob->Carol
Bob->John
John-Peter
John-Sam
Bob->Mary假设我想形象化鲍勃和他的对手之间的关系,有1度的分离(也就是显示每个对应者的关系-从Bob移出1步)。我想使用像这里这样的力图:D3力-可折叠图
此图需要具有以下结构的.JSON文件:
{
"name": "Bob", "size":5000,
"children":
[
{"name":"Alice","size":3000},
{"name":"Carol","size":3000},
{"name":"John","size":3000,
"children":[
{"name":"Peter","size":3000},
{"name":"Sam","size":3000}
]},
{"name":"Mary","size":3000}
]
}所以,有了1度的分离,Bob有4个孩子,其中John有2个孩子。这可以更深的X度的分离,理想的X提供的用户,但实际上也可以是硬编码,即3或5级。
发布于 2017-01-13 07:12:41
试着在下面
它是足够通用的,如果需要将它扩展到更多的分离程度,可以遵循非常简单的模式。
为了举例说明,我引入了大小属性的逻辑(在下面的示例中),即从项目的数量(包括它本身)的角度来表示项的大小,所以它本质上是子级+1的计数。
因此,享受:
#standardSQL
CREATE TEMP FUNCTION size(item STRING) AS (
(SELECT CAST(IFNULL(1 + (LENGTH(item) - LENGTH(REPLACE(item, 'name', '')))/4, 1) AS STRING))
);
CREATE TEMP FUNCTION dress(parent STRING, children STRING) AS (
(SELECT CONCAT('{"name":"', parent, '","size":', size(children), IFNULL(CONCAT(',"children":[', children, ']'), ''), '}'))
);
WITH items AS (
SELECT 'Bob' AS parent, 'Alice' AS child UNION ALL
SELECT 'Bob' AS parent, 'Carol' AS child UNION ALL
SELECT 'Bob' AS parent, 'John' AS child UNION ALL
SELECT 'John' AS parent, 'Peter' AS child UNION ALL
SELECT 'John' AS parent, 'Sam' AS child UNION ALL
SELECT 'Peter' AS parent, 'Sam' AS child UNION ALL
SELECT 'Sam' AS parent, 'Mike' AS child UNION ALL
SELECT 'Sam' AS parent, 'Nick' AS child UNION ALL
SELECT 'Bob' AS parent, 'Mary' AS child
), degree2 AS (
SELECT d1.parent AS parent, d1.child AS child_1, d2.child AS child_2
FROM items AS d1 LEFT JOIN items AS d2 ON d1.child = d2.parent
), degree3 AS (
SELECT d1.*, d2.child AS child_3
FROM degree2 AS d1 LEFT JOIN items AS d2 ON d1.child_2 = d2.parent
), degree4 AS (
SELECT d1.*, d2.child AS child_4
FROM degree3 AS d1 LEFT JOIN items AS d2 ON d1.child_3 = d2.parent
)
SELECT STRING_AGG(dress(parent, child_1), ',') AS parent FROM (
SELECT parent, STRING_AGG(dress(child_1, child_2), ',') AS child_1 FROM (
SELECT parent, child_1, STRING_AGG(dress(child_2, child_3), ',') AS child_2 FROM (
SELECT parent, child_1, child_2, STRING_AGG(dress(child_3, child_4), ',') AS child_3 FROM (
SELECT parent, child_1, child_2, child_3, STRING_AGG(dress(child_4, NULL), ',') AS child_4 FROM degree4
GROUP BY 1,2,3,4 ORDER BY 1,2,3,4 )
GROUP BY 1,2,3 ORDER BY 1,2,3 )
GROUP BY 1,2 ORDER BY 1,2 ) GROUP BY 1 ORDER BY 1 ) 它准确地返回了您需要的内容-请参阅下面的“美化”版本。
{"name": "Bob","size": 12,"children": [
{"name": "Alice","size": 1},
{"name": "Carol","size": 1},
{"name": "John","size": 8,"children": [
{"name": "Peter","size": 4,"children": [
{"name": "Sam","size": 3,"children": [
{"name": "Mike","size": 1},
{"name": "Nick","size": 1} ]}
]},
{"name": "Sam","size": 3,"children": [
{"name": "Mike","size": 1},
{"name": "Nick","size": 1} ]}
]},
{"name": "Mary","size": 1}
]},
{"name": "John","size": 8,"children": [
{"name": "Peter","size": 4,"children": [
{"name": "Sam","size": 3,"children": [
{"name": "Mike","size": 1},
{"name": "Nick","size": 1} ]}
]},
{"name": "Sam","size": 3,"children": [
{"name": "Mike","size": 1},
{"name": "Nick","size": 1} ]}
]},
{"name": "Peter","size": 4,"children": [
{"name": "Sam","size": 3,"children": [
{"name": "Mike","size": 1},
{"name": "Nick","size": 1} ]}
]},
{"name": "Sam","size": 3,"children": [
{"name": "Mike","size": 1},
{"name": "Nick","size": 1} ]}很有可能,上面的内容可以进一步推广--但我认为它已经足够让您尝试了:o)
发布于 2017-01-11 14:22:21
您可以让JavaScript UDF进行递归调用,但它不能执行另一个SQL语句。如果预先知道递归/迭代的次数,则可能会定义SQL函数,例如:
#standardSQL
CREATE TEMP FUNCTION SumToN(x INT64) AS (
(SELECT SUM(v) FROM UNNEST(GENERATE_ARRAY(1, x)) AS v)
);使用GENERATE_ARRAY,您可以创建一个所需长度的for循环。下面是另一个示例,它不涉及UDF,但使用GENERATE_ARRAY连接可变数目的字符串:
#standardSQL
WITH T AS (
SELECT 2 AS x, 'foo' AS y UNION ALL
SELECT 4 AS x, 'bar' AS y)
SELECT
y,
(SELECT STRING_AGG(CONCAT(y, CAST(v AS STRING)))
FROM UNNEST(GENERATE_ARRAY(1, x)) AS v) AS rep_y
FROM T;
+-----+---------------------+
| y | rep_y |
+-----+---------------------+
| foo | foo1,foo2 |
| bar | bar1,bar2,bar3,bar4 |
+-----+---------------------+https://stackoverflow.com/questions/41588810
复制相似问题