我正在寻找将Longest Common Subsquence算法应用于SQL表以获得数据差异视图的研究论文或著作。关于如何解决表差异问题的其他建议也是受欢迎的。挑战在于,SQL表有一个讨厌的习惯,即变得相当大,并应用为文本处理而设计的简单算法,这可能导致程序永远不会结束……
因此,给定一个表Original
Key Content
1 This row is unchanged
2 This row is outdated
3 This row is wrong
4 This row is fine as it is和表New
Key Content
1 This row was added
2 This row is unchanged
3 This row is right
4 This row is fine as it is
5 This row contains important additions我需要找出Diff
+++ 1 This row was added
--- 2 This row is outdated
--- 3 This row is wrong
+++ 3 This row is right
+++ 5 This row contains important additions发布于 2010-07-23 20:15:27
如果将表导出为csv文件,则可以使用http://sourceforge.net/projects/csvdiff/
csvdiff是一个Perl脚本,用于比较/比较两个csv文件,并可以选择分隔符。差异将显示为:"Column XYZ in record 999“是不同的。在此之后,将显示此列的实际结果和预期结果。
发布于 2011-03-25 07:51:09
这可能对你想要的东西来说太简单了,而且这不是研究:-),而只是概念性的。我想您正在比较处理开销的不同方法(?)。
--这是你不想要的一半(A)
SELECT o.Key FROM tbl_ORIGINAL o INNER JOIN tbl_NEW n WHERE o.Content = n.Content--这是你不想要的另一半(B)
SELECT n.Key FROM tbl_ORIGINAL o INNER JOIN tbl_NEW n WHERE o.Content = n.Content--这是你想要的一半(C)
SELECT '+++' as diff, n.key, Content FROM tbl_New n WHERE n.KEY NOT IN( B )--这是你想要的另一半(D)
SELECT '---' as diff, o.key, Content FROM tbl_Original o WHERE o.Key NOT IN ( A )--将C与D相结合
( C )
Union
( D )
Order By diff, key改进..。
--例如,获取最小长度(1000是任意的--只需要一个退出)
declare @i int
set @i = 1
While i < 1000 and Exists (
Select Count(key), Left(content,@i) From Table Having Count(key) > 1 )
BEGIN
i = @i + 1
ENDhttps://stackoverflow.com/questions/3269671
复制相似问题