请帮帮我,我有一个列Delay_Reason,其中的值如下所示。值可能以";#“结尾,也可能不是。
DT1-Increased_CT_Reason_Start_to_Accept
ERIC_Drive Test Taking too Long;#ERIC_Lack Of GSC Resources/Queuing DT Drives;#ERIC_Cluster Having Too Many RF Issues Needing Tuning;#
ERIC_Drive Test Taking too Long;#ERIC_Lack Of GSC Resources/Queuing DT Drives;#
ERIC_Drive Test Taking too Long;#我得数一下delay_reason。
我想要的输出是
DT1-Increased_CT_Reason_Start_to_Accept count
ERIC_Drive Test Taking too Long 3
ERIC_Lack Of GSC Resources/Queuing DT Drives 2
ERIC_Cluster Having Too Many RF Issues Needing Tuning 1发布于 2017-01-25 12:15:33
您可以使用的一个技巧是将Delay_Reason列的长度与删除了单个字母的同一列的长度进行比较。然后,对整个表中的差异进行汇总,以获得出现的次数。
SELECT 'a' AS Delay_Reason,
SUM(CHAR_LENGTH(Delay_Reason) - CHAR_LENGTH(REPLACE(Delay_Reason, 'a', ''))) count
FROM yourTable
UNION ALL
SELECT 'b',
SUM(CHAR_LENGTH(Delay_Reason) - CHAR_LENGTH(REPLACE(Delay_Reason, 'b', '')))
FROM yourTable
UNION ALL
SELECT 'c',
SUM(CHAR_LENGTH(Delay_Reason) - CHAR_LENGTH(REPLACE(Delay_Reason, 'c', '')))
FROM yourTable演示:
SQL Fiddle
更新:
如果您想让上面的查询适用于长度超过单个字符的术语,您只需使用要搜索的术语的长度进行标准化:
SELECT 'Los NE abc' AS Delay_Reason,
SUM(CHAR_LENGTH(Delay_Reason) - CHAR_LENGTH(REPLACE(Delay_Reason,'Los NE abc','')))
/ CHAR_LENGTH('Los NE abc') AS count
FROM yourTable
UNION ALL
SELECT 'Angeles',
SUM(CHAR_LENGTH(Delay_Reason) - CHAR_LENGTH(REPLACE(Delay_Reason,'Angeles','')))
/ CHAR_LENGTH('Angeles')
FROM yourTable
UNION ALL
SELECT 'California',
SUM(CHAR_LENGTH(Delay_Reason) - CHAR_LENGTH(REPLACE(Delay_Reason,'California','')))
/ CHAR_LENGTH('California')
FROM yourTable以下是此查询的演示:
SQLFiddle
发布于 2017-01-25 13:26:44
SELECT Delay_Reason,COUNT(*)
FROM test
GROUP BY Delay_Reason;发布于 2017-01-26 07:06:06
问题是数据库在一个字段中存储多个值。当前架构违反了第三范式。
一种解决方案是运行一个查询来规范化您的数据,然后对该查询运行常规聚合查询。substring_index可以用于此目的。
SQL Fiddle
CREATE TABLE yourTable (`Delay_Reason` varchar(512));
INSERT INTO yourTable (`Delay_Reason`)
VALUES
('ERIC_Drive Test Taking too Long;#ERIC_Lack Of GSC Resources/Queuing DT Drives;#ERIC_Cluster Having Too Many RF Issues Needing Tuning;#'),
('ERIC_Drive Test Taking too Long;#ERIC_Lack Of GSC Resources/Queuing DT Drives;# '),
('ERIC_Drive Test Taking too Long;#') ;
select delay_reason, count(*) count from (
/*
normalise the data
add as many substring_index union all elements as required
*/
select SUBSTRING_INDEX(delay_reason, ';#', 1) AS delay_reason from yourTable
union all
select SUBSTRING_INDEX(SUBSTRING_INDEX(delay_reason, ';#', 2), ';#', -1) AS delay_reason from yourTable
union all
select SUBSTRING_INDEX(SUBSTRING_INDEX(delay_reason, ';#', 3), ';#', -1) AS delay_reason from yourTable
union all
select SUBSTRING_INDEX(SUBSTRING_INDEX(delay_reason, ';#', 4), ';#', -1) AS delay_reason from yourTable
union all
select SUBSTRING_INDEX(SUBSTRING_INDEX(delay_reason, ';#', 5), ';#', -1) AS delay_reason from yourTable
union all
select SUBSTRING_INDEX(SUBSTRING_INDEX(delay_reason, ';#', 6), ';#', -1) AS delay_reason from yourTable
union all
select SUBSTRING_INDEX(SUBSTRING_INDEX(delay_reason, ';#', 7), ';#', -1) AS delay_reason from yourTable
union all
select SUBSTRING_INDEX(SUBSTRING_INDEX(delay_reason, ';#', 8), ';#', -1) AS delay_reason from yourTable
union all
select SUBSTRING_INDEX(SUBSTRING_INDEX(delay_reason, ';#', 9), ';#', -1) AS delay_reason from yourTable
union all
select SUBSTRING_INDEX(SUBSTRING_INDEX(delay_reason, ';#', 10), ';#', -1) AS delay_reason from yourTable
) delay_reasons
/* remove empty values */
where delay_reason <> ''
group by delay_reason
order by count desc;--示例结果
delay_reason计数
ERIC_Drive测试耗时太长%3
GSC资源/排队DT驱动器的ERIC_Lack 2
ERIC_Cluster有太多需要调整的RF问题%1
如果您有权更改模式,还可以对数据进行规范化。Search Google for third normal form explained
https://stackoverflow.com/questions/41843224
复制相似问题