我有一份产业清单,其中有一个相邻行业的列表来对它们进行分类。我想知道哪些行业是最常见的,但我无法将两个词的类别解释为一个。
首先,我想知道哪5类是最普遍的。我也想知道前5个单词(黑色),两个单词(红色)和三个词(蓝色)类别。
另外,我想去掉逗号。
下面是我想要实现的目标,并链接到google文档,其中我列出了所有的数据:

https://docs.google.com/spreadsheets/d/13N8gc4POPhFhTvyqq-UugWS5GCgcONwliacSL8-MAr8/edit#gid=0
如何对这些类别进行分组和列出?
发布于 2020-04-06 23:09:14
粗俗的字眼:
=ARRAYFORMULA(QUERY(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ", ")),
"select Col1,count(Col1)
group by Col1
order by count(Col1) desc
limit 5
label count(Col1)''"))整句话:
=ARRAYFORMULA(QUERY(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ",")),
"select Col1,count(Col1)
group by Col1
order by count(Col1) desc
limit 5
label count(Col1)''"))一个词:
=ARRAYFORMULA(QUERY(TRIM(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ","))),
"select Col1,count(Col1)
where not Col1 contains ' '
group by Col1
order by count(Col1) desc
limit 5
label count(Col1)''"))两个词:
=ARRAYFORMULA(QUERY(TRIM(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ","))),
"select Col1,count(Col1)
where Col1 matches '\w+ \w+'
group by Col1
order by count(Col1) desc
limit 5
label count(Col1)''"))三个字:
=ARRAYFORMULA(QUERY(TRIM(TRANSPOSE(SPLIT(QUERY(B3:B11&",",,99^99), ","))),
"select Col1,count(Col1)
where Col1 matches '\w+ \w+ \w+'
group by Col1
order by count(Col1) desc
limit 5
label count(Col1)''"))发布于 2020-04-06 21:30:19
将问题分解成3个公式,将允许您支持任意多个“单词”。
第1步)在D29中加入公式,把所有的单词当作一个单词(看你的问题,这似乎是你真正需要的唯一一步)。
=query(arrayformula(trim(substitute(transpose(split(query({substitute(B3:B," ","_")},"select * where Col1 is not null",counta(B3:B)),", ")),"_"," "))),"select Col1, count(Col1) group by Col1 order by count(Col1) desc label Col1 'Descriptions', count(Col1) 'Frequency'")第2步)将公式放在F29中,将下一个公式放在上面公式生成的表旁边。如果使用不同的范围,则应替换D30:D。
=arrayformula({"Words";if(D30:D="","",1+LEN(D30:D)-len(SUBSTITUTE(D30:D," ","")))})第3步)在G29中放置公式--如果使用不同的位置,这将输出按字数排序的最大频率,D29:F应该被替换
=query({D29:F},"select * where Col1 is not null order by Col3,Col2 desc")这样做的好处是你支持1,2,3,4.单词频率。
https://stackoverflow.com/questions/61066823
复制相似问题