首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >字母表组合

字母表组合
EN

Stack Overflow用户
提问于 2017-09-13 14:39:16
回答 1查看 394关注 0票数 1

我有一个包含1000行和3个变量( ID、国家和字符串变量"VAR1")的表。VAR1是由一个空格分隔的单词组成的句子。

我想按国家来统计所有的夫妻(或三胞胎)的单词。非常重要的是,夫妻(或三胞胎)是所有单词的交叉(不一定是一步一步地)。也许,我们可以用ngram hql-函数来实现它,但是当我使用它时,它是一步一步地计算单词,而不是所有的交叉。

让我们来看看这个例子,让你知道我想要的是什么:

代码语言:javascript
复制
> **"ID" "COUNTRY" "VAR1"**
> "1" "CANADA" "dad mum child"
> "2" "CANADA" "dad mum dog"
> "3" "USA"    "bird lion car"

VAR1不一定是三个字的长度。只是为了简化。

我要分四个步骤来做2- in的检查:

步骤1:最重要的一步:用2横过单词

代码语言:javascript
复制
> "1" "CANADA" "dad mum" 1
> "1" "CANADA" "dad child" 1
> "1" "CANADA" "mum dad" 1
> "1" "CANADA" "mum child" 1
> "1" "CANADA" "child dad" 1
> "1" "CANADA" "child mum" 1
> "2" "CANADA" "dad mum" 1
> "2" "CANADA" "dad dog" 1
> "2" "CANADA" "mum dad" 1
> "2" "CANADA" "mum dog" 1
> "2" "CANADA" "dog dad" 1
> "2" "CANADA" "dog mum" 1
> "3" "USA"    "bird lion" 1
> "3" "USA"    "bird car" 1
> "3" "USA"    "lion bird" 1
> "3" "USA"    "lion car" 1
> "3" "USA"    "car bird" 1
> "3" "USA"    "car lion" 1 

步骤2:订购2克

代码语言:javascript
复制
> "1" "CANADA" "dad mum" 1
> "1" "CANADA" "child dad" 1
> "1" "CANADA" "dad mum" 1
> "1" "CANADA" "child mum" 1
> "1" "CANADA" "child dad" 1
> "1" "CANADA" "child mum" 1
> "2" "CANADA" "dad mum" 1
> "2" "CANADA" "dad dog" 1
> "2" "CANADA" "dad mum" 1
> "2" "CANADA" "dog mum" 1
> "2" "CANADA" "dad dog" 1
> "2" "CANADA" "dog mum" 1
> "3" "USA"    "bird lion" 1
> "3" "USA"    "bird car" 1
> "3" "USA"    "bird lion" 1
> "3" "USA"    "car lion" 1
> "3" "USA"    "bird car" 1
> "3" "USA"    "car lion" 1 

步骤3:按ID、国家、2纳克区分

代码语言:javascript
复制
> "1" "CANADA" "dad mum"
> "1" "CANADA" "child dad"
> "1" "CANADA" "child mum"
> "2" "CANADA" "dad mum"
> "2" "CANADA" "dad dog"
> "2" "CANADA" "dog mum"
> "3" "USA"    "bird lion"
> "3" "USA"    "bird car"
> "3" "USA"    "car lion" 

步骤4:按国家计算,2纳克

代码语言:javascript
复制
> "CANADA" "dad mum" 2 
> "CANADA" "child dad" 1 
> "CANADA" "child mum" 1
> "CANADA" "dad dog" 1 
> "CANADA" "dog mum" 1 
> "USA"    "bird lion" 1 
> "USA"    "bird car" 1 
> "USA"    "car lion" 1

非常感谢

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-09-14 09:51:22

代码语言:javascript
复制
with        cte as
            (
                select  t.ID
                       ,t.COUNTRY
                       ,pe.pos
                       ,pe.val
                from    mytable t
                        lateral view posexplode (split(VAR1,'\\s+')) pe
            )

select      t1.COUNTRY  
           ,concat_ws(' ',t1.val,t2.val)    as combination
           ,count (*)                       as cnt

from                cte t1

            join    cte t2

            on      t2.id   =
                    t1.id

where       t1.pos < t2.pos

group by    t1.COUNTRY  
           ,t1.val 
           ,t2.val
;
代码语言:javascript
复制
+----------+--------------+------+
| country  | combination  | cnt  |
+----------+--------------+------+
| CANADA   | dad child    | 1    |
| CANADA   | dad dog      | 1    |
| CANADA   | dad mum      | 2    |
| CANADA   | mum child    | 1    |
| CANADA   | mum dog      | 1    |
| USA      | bird car     | 1    |
| USA      | bird lion    | 1    |
| USA      | lion car     | 1    |
+----------+--------------+------+
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/46200410

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档