我运行了两个不同的SQL查询,得到了截然不同的结果:
表:
mips:这个表是在time上索引的,并包含我测量的每个“度量”(往返时间、重传字节等)的*_good和*_bad字段。领域有:time、rtt_good、rtt_bad、rexb_good、rexb_bad、nae_good、nae_bad等。
度量:这个表是在time、asn (我们交付流量的网络)、cty (传输该流量的国家)和source (传输流量的数据中心)上索引的。因此,在一次“时间”中,我们有数十万行。每一行都告诉我们服务请求的总数(reqs),以及流量传递的各种度量标准(rtt、rexb、nae等)。
这两个表连接在time列上,该列包含一个UNIX时间戳。所有其他的值都是浮动的。
目标
给定rtt_good (我们认为往返时间的值是“好的”,比如10 is )、rtt_bad (我们认为往返时间的值是“坏的”,比如5秒)和rtt,我们可以执行线性插值,以提供一个度量“有多好”或“有多糟糕”的rtt:
rtt_mips = (rtt - rtt_good) / (rtt_bad - rtt_good)因为我们拥有每个可能的asn、cty和source的数据--我们通常需要聚合这些数据来回答更一般的问题,比如“我们的RTT在墨西哥看上去怎么样?”在聚合时,我们执行加权平均值的度量加权的数量,我们服务的请求。例如,墨西哥的平均RTT是:
select sum(rtt * reqs) / sum(reqs) as avg_rtt from metrics where cty = "mx"现在,问题是,我们并不总是在每隔5分钟从每个数据源为每个国家的每个ASN提供服务。我们可能有一段时间,我们的日本数据中心没有向墨西哥提供任何数据。这意味着,当我们按时间对这些指标进行分组时,我们有很多NULL行:
+------+---------+
| time | avg_rtt |
+------+---------+
| 1 | 300 |
| 2 | NULL |
| 3 | 400 |
| ... | ... |为了解决这一问题,我希望将“最后已知的”RTT复制到下一行,然后计算RTT的“相对良好性”:
+------+---------+------------+----------+---------+----------+
| time | avg_rtt | last_known | rtt_good | rtt_bad | rtt_mips |
+------+---------+------------+----------+---------+----------+
| 1 | 300 | 300 | 10 | 5000 | math |
| 2 | NULL | 300 | 10 | 5000 | math |
| 3 | 400 | 400 | 10 | 5000 | math |
| ... | ... | ... | ... | ... | ... |这可以通过组合MySQL变量和COALESCE来实现,如下所示:
select @rtt := coalesce(rtt, @rtt) from metrics如果rtt不是NULL,我们使用rtt。如果rtt 是 NULL,则使用来自上一行的@rtt变量。
将所有这些放在一起,您将得到查询1,如下所示。
但是,我打算使用它的输出在JavaScript中绘制图形,因此我希望将time列乘以1000 (将秒转换为毫秒)。这导致了查询2,它具有不同的(和意外的)行为。
查询1:
select
mips.time,
@rtt := coalesce(sum(rtt*reqs)/sum(reqs), @rtt) as rtt,
(coalesce(sum(rtt*reqs)/sum(reqs), @rtt) - rtt_good) / (rtt_bad - rtt_good) as rtt_mips
from
mips
left join
(
select * from metrics where asn = '33095' and cty = 'us'
) t1 on mips.time = t1.time
group by time
order by time asc;结果:
+------------+-----------------+----------------------+
| time | rtt | rtt_mips |
+------------+-----------------+----------------------+
| 1521731100 | NULL | NULL |
| 1521731400 | NULL | NULL |
| 1521731700 | 12593 | 0.04197666666666667 |
| 1521732000 | 12593 | 0.04197666666666667 |
| 1521732300 | 12593 | 0.04197666666666667 |
| 1521732600 | 12593 | 0.04197666666666667 |
| 1521732900 | 41266.90234375 | 0.13755633333333334 |
| 1521733200 | 41266.90234375 | 0.13755634114583334 |
| 1521733500 | 41266.90234375 | 0.13755634114583334 |
| 1521733800 | 41266.90234375 | 0.13755634114583334 |
| 1521734100 | 41266.90234375 | 0.13755634114583334 |
| 1521734400 | 41266.90234375 | 0.13755634114583334 |
| 1521734700 | 41266.90234375 | 0.13755634114583334 |
| 1521735000 | 14979.439453125 | 0.049931333333333335 |
| 1521735300 | 11812.119140625 | 0.03937366666666667 |
| 1521735600 | 11812.119140625 | 0.03937373046875 |
| 1521735900 | 8738.2314453125 | 0.02912743333333333 |
| 1521736200 | 8738.2314453125 | 0.029127438151041667 |
| 1521736500 | 8738.2314453125 | 0.029127438151041667 |
| 1521736800 | 8738.2314453125 | 0.029127438151041667 |
+------------+-----------------+----------------------+
20 rows in set (0.22 sec)查询2:
select
mips.time * 1000 as time, -- The only line that changed
@rtt := coalesce(sum(rtt*reqs)/sum(reqs), @rtt) as rtt,
(coalesce(sum(rtt*reqs)/sum(reqs), @rtt) - rtt_good) / (rtt_bad - rtt_good) as rtt_mips
from
mips
left join
(
select * from metrics where asn = '33095' and cty = 'us'
) t1 on mips.time = t1.time
group by time
order by time asc;结果:
+---------------+-----------------+----------------------+
| time | rtt | rtt_mips |
+---------------+-----------------+----------------------+
| 1521731100000 | NULL | NULL |
| 1521731400000 | NULL | NULL |
| 1521731700000 | 12593 | 0.04197666666666667 |
| 1521732000000 | NULL | NULL |
| 1521732300000 | NULL | NULL |
| 1521732600000 | NULL | NULL |
| 1521732900000 | 41266.90234375 | 0.13755633333333334 |
| 1521733200000 | NULL | NULL |
| 1521733500000 | NULL | NULL |
| 1521733800000 | NULL | NULL |
| 1521734100000 | NULL | NULL |
| 1521734400000 | NULL | NULL |
| 1521734700000 | NULL | NULL |
| 1521735000000 | 14979.439453125 | 0.049931333333333335 |
| 1521735300000 | 11812.119140625 | 0.03937366666666667 |
| 1521735600000 | NULL | NULL |
| 1521735900000 | 8738.2314453125 | 0.02912743333333333 |
| 1521736200000 | NULL | NULL |
| 1521736500000 | NULL | NULL |
| 1521736800000 | NULL | NULL |
+---------------+-----------------+----------------------+
20 rows in set (0.41 sec)问题:
为什么当我将time列更改为time * 1000时,我的变量停止正确设置,而我的查询开始返回NULLs?
版本信息:
mysql> select version();
+-----------------+
| version() |
+-----------------+
| 10.1.26-MariaDB |
+-----------------+
1 row in set (0.10 sec)回复@whoami
首先,以下查询的结果:
mysql> select * from mips where time = 1521731700000;
Empty set (0.15 sec)还有一个类似的查询:
mysql> select * from mips where time = 1521731700;
+------------+----------+---------+-----------+----------+----------+---------+-----------+----------+---------+--------+---------+--------+
| time | rtt_good | rtt_bad | rexb_good | rexb_bad | nae_good | nae_bad | util_good | util_bad | fb_good | fb_bad | or_good | or_bad |
+------------+----------+---------+-----------+----------+----------+---------+-----------+----------+---------+--------+---------+--------+
| 1521731700 | 0 | 300000 | 0 | 40 | 25 | 100 | 0 | 80 | 0 | 100 | 0 | 100 |
+------------+----------+---------+-----------+----------+----------+---------+-----------+----------+---------+--------+---------+--------+
1 row in set (0.10 sec)然后,我尝试按rtt_good和rtt_bad分组,并将metrics的time列乘以1000。
查询:
select
mips.time * 1000 as time,
@rtt := coalesce(sum(rtt*reqs)/sum(reqs), @rtt) as rtt,
(coalesce(sum(rtt*reqs)/sum(reqs), @rtt) - rtt_good) / (rtt_bad - rtt_good) as rtt_mips
from
mips
left join
(
select time * 1000 as time, rtt, reqs from metrics where asn = '33095' and cty = 'us'
) t1 on mips.time = t1.time
group by time, rtt_good, rtt_bad
order by time asc;结果:
+---------------+------+----------+
| time | rtt | rtt_mips |
+---------------+------+----------+
| 1521731100000 | NULL | NULL |
| 1521731400000 | NULL | NULL |
| 1521731700000 | NULL | NULL |
| 1521732000000 | NULL | NULL |
| 1521732300000 | NULL | NULL |
| 1521732600000 | NULL | NULL |
| 1521732900000 | NULL | NULL |
| 1521733200000 | NULL | NULL |
| 1521733500000 | NULL | NULL |
| 1521733800000 | NULL | NULL |
| 1521734100000 | NULL | NULL |
| 1521734400000 | NULL | NULL |
| 1521734700000 | NULL | NULL |
| 1521735000000 | NULL | NULL |
| 1521735300000 | NULL | NULL |
| 1521735600000 | NULL | NULL |
| 1521735900000 | NULL | NULL |
| 1521736200000 | NULL | NULL |
| 1521736500000 | NULL | NULL |
| 1521736800000 | NULL | NULL |
+---------------+------+----------+
20 rows in set (0.17 sec)因为时间1521736800000不存在于mips表中,所以它无法正确地加入。
有趣的发现
即使我没有将time列乘以1000,如果我添加了额外的group by列,那么查询仍然无法按我预期的方式操作:
select
mips.time,
@rtt := coalesce(sum(rtt*reqs)/sum(reqs), @rtt) as rtt,
(coalesce(sum(rtt*reqs)/sum(reqs), @rtt) - rtt_good) / (rtt_bad - rtt_good) as rtt_mips
from
mips
left join
(
select time, rtt, reqs from metrics where asn = '33095' and cty = 'us'
) t1 on mips.time = t1.time
group by time, rtt_good, rtt_bad
order by time asc;结果:
+------------+-----------------+----------------------+
| time | rtt | rtt_mips |
+------------+-----------------+----------------------+
| 1521731100 | NULL | NULL |
| 1521731400 | NULL | NULL |
| 1521731700 | 12593 | 0.04197666666666667 |
| 1521732000 | NULL | NULL |
| 1521732300 | NULL | NULL |
| 1521732600 | NULL | NULL |
| 1521732900 | 41266.90234375 | 0.13755633333333334 |
| 1521733200 | NULL | NULL |
| 1521733500 | NULL | NULL |
| 1521733800 | NULL | NULL |
| 1521734100 | NULL | NULL |
| 1521734400 | NULL | NULL |
| 1521734700 | NULL | NULL |
| 1521735000 | 14979.439453125 | 0.049931333333333335 |
| 1521735300 | 11812.119140625 | 0.03937366666666667 |
| 1521735600 | NULL | NULL |
| 1521735900 | 8738.2314453125 | 0.02912743333333333 |
| 1521736200 | NULL | NULL |
| 1521736500 | NULL | NULL |
| 1521736800 | NULL | NULL |
+------------+-----------------+----------------------+
20 rows in set (0.12 sec)我觉得我遇到了一个奇怪的情况,存储引擎是如何优化这些查询的。
发布于 2018-03-22 18:50:42
我认为像这样的事情应该会更有预见性:
SELECT mips.time * 1000 AS mips_time,
@prev_rtt := coalesce(m_sum.weighted_rtt, @prev_rtt) as rtt,
(coalesce(m_sum.weighted_rtt, @prev_rtt) - rtt_good) / (rtt_bad - rtt_good) as rtt_mips
FROM
mips
LEFT JOIN
(
SELECT m.time, sum(m.rtt*m.reqs)/sum(m.reqs) AS weighted_rtt
FROM metrics AS m
WHERE m.asn = '33095' and m.cty = 'us'
GROUP BY m.time
) AS m_sum ON mips.time = m_sum.time
ORDER BY mips.time asc;在我的经验中,(@prev_rtt - rtt_good) / (rtt_bad - rtt_good) as rtt_mips也应该在这个查询中工作,就像as rtt的前面表达式应该分配了@prev_rtt一样;但是,这是冒险进入“行为方式,但实际上不是由MySQL保证”的领域,因为MySQL不能保证select表达式的计算顺序。
发布于 2018-03-22 17:12:56
将查询更改为以下内容。您必须先初始化var,然后才能用它计算,否则它是空的。
select
mips.time,
@rtt := coalesce(sum(rtt*reqs)/sum(reqs), @rtt) as rtt,
(coalesce(rtt, @rtt) - rtt_good) / (rtt_bad - rtt_good) as rtt_mips
from
mips
left join
(
select * from metrics where asn = '33095' and cty = 'us'
) t1 on mips.time = t1.time
cross join ( select @rtt := 0 ) as init
group by time
order by time asc;https://stackoverflow.com/questions/49434510
复制相似问题