我的一个表中的SELECT DISTINCT值有一个很大的问题。
表1 :T1
pid box cassette seal added (timestamp)
---------------------------------------------------------------
1 A1212 A01A00001 P123456 2015-01-01 12:00:01
2 A1212 A01A00001 P123457 2015-01-01 12:00:01
3 A1214 A01A00004 C123458 2015-01-01 12:00:01
4 A1214 A01B00005 D123459 2015-01-01 12:00:01
5 A1214 A01B00006 D123460 2015-01-01 12:00:01
6 A1212 A01B00007 E123461 2015-01-01 12:00:01
7 A1212 A01B00007 E123462 2015-01-01 12:00:01 表2 :T2
id t1_pid box cassette seal error despatched
------------------------------------------------------------------------
1 3 A1214 A01A00004 C123458 false true
2 7 A1212 A01B00007 E123462 true false我需要SELECT所有的DISTINCT从Table T1的盒子,磁带,密封: 1.不在Table T2和,最重要的-只有与seal的pid是最高的/最后添加或2.是在Table T2但T2.error=true或T2.despached=false
结果应保留T1记录
1-因为记录2具有相同的盒子,所以盒式磁带但记录1的pid较低
3-因为只有t2.t1_pid=3和despatched=TRUE
6-因为记录7具有相同的盒子,所以盒式磁带但记录6具有较低的pid
结果中应该有记录7,因为有t2.t1_pid=7但有error=TRUE
结果表:
id box cassette seal
-------------------------------------------
2 A1212 A01A00001 P123457 /(rec. no 2)
4 A1214 A01B00005 D123459 /(rec. no 4)
5 A1214 A01B00006 D123460 /(rec. no 5)
6 A1212 A01B00007 E123462 /(rec. no 6) 我已经尝试了以下语法,如果密封数较高,则可以使用。我需要把if t1.pid的条件改得更高,但是我搞不清楚。
SELECT DISTINCT T1.pid, T1.box, T1.cassette, T1.seal
FROM T1 INNER JOIN
(SELECT T1.box, T1.cassette, max(T1.seal) as seal FROM
T1 LEFT OUTER JOIN T2 o ON T1.pid=o.t1_pid WHERE
(o.id IS NULL or (o.despatched=0 ))
GROUP BY T1.cassette, T1.box)
as b using (cassette, box, seal) 非常感谢您的帮助和宝贵的时间
发布于 2015-12-07 15:49:52
此任务与DISTINCT无关,因为我们在这里讨论的不是必须消除的重复记录。这更多的是关于聚合(即将结果沸腾成唯一的盒子/磁带数据)。
您为T1记录指定了两个条件:
最重要的是,
中
条件1:
where t1.pid not in (select t1_pid from t2)
and not exists
(
select *
from t1 as later
where later.box = t1.box
and later.cassette = t1.cassette
and later.pid > t1.pid
)条件2:
where t1.pid in
(
select t1_pid
from t2
where t2.error = true
or t2.despached = false
)然而,这还不够,因为我们仍然可以获得一个盒子和盒子的多个记录(一个匹配条件1,一个匹配条件2,或者多个匹配条件2)。在您的评论中添加了第三个条件:
也许你存储你的数据,这样一个盒子/盒子总是只与两个条件匹配一次,但至少在技术上是可能的,所以我们应该找到一种方法来处理它。最简单的是按盒子和盒子分组,以确保每个盒子和盒子只有一条结果记录。然后显示与其匹配的最小或最大印章。
select box, casette, max(seal)
from t1
where
(
t1.pid not in (select t1_pid from t2)
and not exists
(
select *
from t1 as later
where later.box = t1.box
and later.cassette = t1.cassette
and later.pid > t1.pid
)
)
or t1.pid in
(
select t1_pid
from t2
where t2.error = true
or t2.despached = false
)
group by box, casette;我没有在结果中显示I,因为我不知道您是如何获得它们的。您说它们是T1 ID,但它不是所选记录的ID,例如,您选择record 2 (pid 2),但在您的结果中,您显示ID为1的记录,原因我不明白。
发布于 2015-12-07 21:05:06
我创建了以下语法,我认为这些语法工作得很好,但如果它是针对具有1亿级记录的表T1、T2而优化的,我真的不会这么做
SELECT abc.pid, abc.box, abc.cassette, abc.seal FROM
(
SELECT tt.pid, tt.box, tt.cassette, tt.seal
FROM t1 tt
INNER JOIN
(
SELECT box, MAX(pid) AS pid
FROM t1
WHERE added>DATE(now() - INTERVAL 2 DAY) /* for example */
GROUP BY box, cassette
) groupedtt
ON tt.box = groupedtt.box
AND tt.pid = groupedtt.pid
ORDER BY box,cassette
) abc
LEFT OUTER JOIN t2 o ON abc.pid=o.pid
WHERE
(
o.pid IS NULL
or (
o.despatched=0 AND
o.added>DATE(NOW() - INTERVAL 2 DAY) /* for example */
)
) https://stackoverflow.com/questions/34087039
复制相似问题