在具有测量和状态数据的大型数据库中,我打算减少数据,而不丢失大量信息。我研究了几个例子,但我的SQL技巧似乎太有限,无法成功.
这个表有数百万的数据。表定义为
TIMESTAMP TIMESTAMP, DEVICE varchar(32), TYPE varchar(32), EVENT varchar(512), READING varchar(32), VALUE varchar(32), UNIT varchar(32)一些例子数据。完整的表格有许多不同的设备和读数,每个设备和读数都应该单独处理:
+---------------------+----------+------+---------+---------+-------+------+
| TIMESTAMP | DEVICE | TYPE | EVENT | READING | VALUE | UNIT |
+---------------------+----------+------+---------+---------+-------+------+
| 2016-03-27 10:17:45 | KNX_428c | KNX | 49 mA | state | 49 | mA |
| 2016-03-27 10:19:45 | KNX_428c | KNX | 47 mA | state | 47 | mA |
| 2016-03-27 10:21:44 | KNX_428c | KNX | 50 mA | state | 50 | mA |
| 2016-03-27 10:23:44 | KNX_428c | KNX | 50 mA | state | 50 | mA |
| 2016-03-27 10:23:44 | KNX_428c | KNX | 50 mA | state | 50 | mA |
| 2016-03-27 10:25:44 | KNX_428c | KNX | 50 mA | state | 50 | mA |
| 2016-03-27 10:25:44 | KNX_428c | KNX | 50 mA | state | 50 | mA |
| 2016-03-27 10:27:44 | KNX_428c | KNX | 50 mA | state | 50 | mA |
| 2016-03-27 10:27:44 | KNX_428c | KNX | 50 mA | state | 50 | mA |
| 2016-03-27 10:29:44 | KNX_428c | KNX | 50 mA | state | 50 | mA |
| 2016-03-27 10:31:44 | KNX_428c | KNX | 50 mA | state | 50 | mA |
| 2016-03-27 10:31:44 | KNX_428c | KNX | 47 mA | state | 47 | mA |
| 2016-03-27 10:33:44 | KNX_428c | KNX | 50 mA | state | 50 | mA |
| 2016-03-27 10:33:44 | KNX_428c | KNX | 50 mA | state | 50 | mA |
| 2016-03-27 10:34:04 | KNX_428c | KNX | 136 mA | state | 136 | mA |
| 2016-03-27 10:34:04 | KNX_428c | KNX | 165 mA | state | 165 | mA |
| 2016-03-27 10:34:05 | KNX_428c | KNX | 136 mA | state | 136 | mA |
| 2016-03-27 10:34:05 | KNX_428c | KNX | 107 mA | state | 107 | mA |
| 2016-03-27 10:34:05 | KNX_428c | KNX | 79 mA | state | 79 | mA |
| 2016-03-27 10:34:06 | KNX_428c | KNX | 50 mA | state | 50 | mA |
| 2016-03-27 10:34:29 | KNX_428c | KNX | 107 mA | state | 107 | mA |
| 2016-03-27 10:34:29 | KNX_428c | KNX | 136 mA | state | 136 | mA |
| 2016-03-27 10:34:30 | KNX_428c | KNX | 165 mA | state | 165 | mA |
| 2016-03-27 10:34:30 | KNX_428c | KNX | 139 mA | state | 139 | mA |
| 2016-03-27 10:34:30 | KNX_428c | KNX | 107 mA | state | 107 | mA |
| 2016-03-27 10:34:31 | KNX_428c | KNX | 51 mA | state | 51 | mA |
| 2016-03-27 10:34:44 | KNX_428c | KNX | 0 mA | state | 0 | mA |
| 2016-03-27 10:35:44 | KNX_428c | KNX | 0 mA | state | 0 | mA |
| 2016-03-27 10:37:44 | KNX_428c | KNX | 0 mA | state | 0 | mA |
| 2016-03-27 10:37:44 | KNX_428c | KNX | 0 mA | state | 0 | mA |
| 2016-03-27 10:39:43 | KNX_428c | KNX | 0 mA | state | 0 | mA |
| 2016-03-27 10:41:43 | KNX_428c | KNX | 0 mA | state | 0 | mA |
| 2016-03-27 10:43:43 | KNX_428c | KNX | 0 mA | state | 0 | mA |
| 2016-03-27 10:45:43 | KNX_428c | KNX | 0 mA | state | 0 | mA |
| 2016-03-27 10:47:43 | KNX_428c | KNX | 0 mA | state | 0 | mA |
| 2016-03-27 10:47:43 | KNX_428c | KNX | 0 mA | state | 0 | mA |
| 2016-03-27 10:49:43 | KNX_428c | KNX | 0 mA | state | 0 | mA |我打算做两件事:
我在select语句和group中实现的第一件事。但我不知道如何真正改变数据库。
SELECT *,MAX(VALUE) FROM filelog
GROUP BY TIMESTAMP,DEVICE,READING在第二步中,我找到了几个示例,但它们总是将复制合并在一个记录中,而不是像我打算做的那样在两个(第一个和最后一个)中组合。通常情况下,这些示例与连接一起工作,我认为这在数百万数据集中是不可能的。
结果如下:
| 2016-03-27 10:17:45 | KNX_428c | KNX | 49 mA | state | 49 | mA |
| 2016-03-27 10:19:45 | KNX_428c | KNX | 47 mA | state | 47 | mA |
| 2016-03-27 10:21:44 | KNX_428c | KNX | 50 mA | state | 50 | mA |
| 2016-03-27 10:33:44 | KNX_428c | KNX | 50 mA | state | 50 | mA |
| 2016-03-27 10:34:04 | KNX_428c | KNX | 136 mA | state | 165 | mA |
| 2016-03-27 10:34:05 | KNX_428c | KNX | 136 mA | state | 136 | mA |
| 2016-03-27 10:34:06 | KNX_428c | KNX | 50 mA | state | 50 | mA |
| 2016-03-27 10:34:29 | KNX_428c | KNX | 107 mA | state | 136 | mA |
| 2016-03-27 10:34:30 | KNX_428c | KNX | 165 mA | state | 165 | mA |
| 2016-03-27 10:34:31 | KNX_428c | KNX | 51 mA | state | 51 | mA |
| 2016-03-27 10:34:44 | KNX_428c | KNX | 0 mA | state | 0 | mA |
| 2016-03-27 10:49:43 | KNX_428c | KNX | 0 mA | state | 0 | mA | 多谢你们的支持。
发布于 2017-01-03 23:09:19
对于第一个查询,如果要在聚合后获得完整记录,则需要做比建议的更多的工作。一种方法是做一个额外的连接:
SELECT t1.*
FROM filelog t1
INNER JOIN
(
SELECT TIMESTAMP, DEVICE, READING, MAX(VALUE) AS VALUE
FROM filelog
GROUP BY TIMESTAMP, DEVICE, READING
) t2
ON t1.TIMESTAMP = t2.TIMESTAMP AND
t1.DEVICE = t2.DEVICE AND
t1.READING = t2.READING AND
t1.VALUE = t2.VALUEhttps://stackoverflow.com/questions/41453626
复制相似问题