我想知道是否可以在BigQuery中使用正则表达式从字符串中提取所有数字。
我认为下面的方法可以工作,但只返回第一次点击--有没有一种方法可以提取所有的点击。
我在这里的用例是,我基本上希望从url中获得最大的数字,因为这更像是我需要加入的post_id。
下面是我正在讨论的一个例子:
SELECT
mystr,
REGEXP_EXTRACT(mystr, r'(\d+)') AS nums
FROM
(SELECT 'this is a string with some 666 numbers 999 in it 333' AS mystr),
(SELECT 'just one number 123 in this one ' AS mystr),
(SELECT '99' AS mystr),
(SELECT 'another -2 example 99' AS mystr),
(SELECT 'another-8766 example 99' AS mystr),
(SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999' AS mystr),
(SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/gallery/001' AS mystr),
(SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/print-preview' AS mystr)我从中得到的结果是:
[
{
"mystr": "this is a string with some 666 numbers 999 in it 333",
"nums": "666"
},
{
"mystr": "just one number 123 in this one ",
"nums": "123"
},
{
"mystr": "99",
"nums": "99"
},
{
"mystr": "another -2 example 99",
"nums": "2"
},
{
"mystr": "another-8766 example 99",
"nums": "8766"
},
{
"mystr": "http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999",
"nums": "2015"
},
{
"mystr": "http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/gallery/001",
"nums": "2015"
},
{
"mystr": "http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/print-preview",
"nums": "2015"
}
]发布于 2015-12-15 22:02:36
经过一番挖掘,我最终得到了这个解决方案:
SELECT
mystr,
GROUP_CONCAT(SPLIT(REGEXP_REPLACE(mystr, r'[^\d]+', ','))) AS nums
FROM
(SELECT 'this is a string with some 666 numbers 999 in it 333' AS mystr),
(SELECT 'just one number 123 in this one ' AS mystr),
(SELECT '99' AS mystr),
(SELECT 'another -2 example 99' AS mystr),
(SELECT 'another-8766 example 99' AS mystr),
(SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999' AS mystr),
(SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/gallery/001' AS mystr),
(SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/print-preview' AS mystr)

它的工作原理:
我首先使用正则表达式来匹配任何非数字,并将其替换为逗号
split来获得结果,group_concat is只是在这里显示结果
发布于 2015-12-16 08:13:57
虽然您将越来越多地在BigQuery中使用正则表达式,但您会意识到,到目前为止,它的实现相当有限
BigQuery Regular expression functions
所以很有可能你很快就会做如下所示的事情
请注意-对于您当前的特定示例-以下代码与@Cybril提供的简单解决方案相比绝对没有任何好处
此解决方案更适合您近期的潜在需求
它使用javascript UDF,从而为您提供了javascript regexp实现的能力。
BigQuery User-Defined Functions
SELECT mystr, MAX(number) as max_number FROM JS(
// input table
(SELECT mystr FROM
(SELECT 'this is a string with some 666 numbers 999 in it 333' AS mystr),
(SELECT 'just one number 123 in this one ' AS mystr),
(SELECT '99' AS mystr),
(SELECT 'another -2 example 99' AS mystr),
(SELECT 'another-8766 example 99' AS mystr),
(SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999' AS mystr),
(SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/gallery/001' AS mystr),
(SELECT 'http://somedomain.com/2015/12/this-is-a-post-with-id-in-url-99999/print-preview' AS mystr)
) ,
// input columns
mystr,
// output schema
"[
{name: 'mystr', type: 'string'},
{name: 'number', type: 'string'}
]",
// function
"function(r, emit){
var numbers = r.mystr.match(/(\d+)/g);
for (var i=0; i < numbers.length; i++) {
emit({
mystr: r.mystr,
number: numbers[i]
});
};
}"
)
GROUP BY 1当然,您也可以在UDF中移动确定最大值的逻辑,以消除额外的分组
https://stackoverflow.com/questions/34290723
复制相似问题