概述:
在Windows 10下运行“Ubuntu18.04.1LTS”,下载自:
https://docs.microsoft.com/en-us/windows/wsl/install-manual当我试图对" -k“(”-k“是-k)进行”排序“时,会得到不同的结果:
如果我在基于Cisco的RHELRHEL7.7 (Maipo)服务器上运行相同的文件,那么所有六种类型(“排序-k”)都可以在指定列上工作。
我是否在Ubuntu中遇到了一个bug,或者在Ubuntu和RHEL上是否需要以不同的方式运行?
样本分类:
LAPTOP-MOQUDB6E:/tmp $ lsb_release -d
Description: Ubuntu 18.04.1 LTS
LAPTOP-MOQUDB6E:/tmp $ cat members # {original file}
00013 Snow White Disney_Princess Europe Enchanted_Forest
00016 Wiley Coyote Roadrunner_Nemesis North_America La_Paz
00018 Jiminy Cricket Disney Europe Tuscon0y
00019 Speedy Gonzales Cats_Meow North_America Guadalajara
00017 Jasmine Disney_Princess Asia Desert_Sands
LAPTOP-MOQUDB6E:/tmp $ sort -k2 members # {"-k2": Colum_2 sort works as expected}
00017 Jasmine Disney_Princess Asia Desert_Sands
00018 Jiminy Cricket Disney Europe Tuscon0y
00013 Snow White Disney_Princess Europe Enchanted_Forest
00019 Speedy Gonzales Cats_Meow North_America Guadalajara
00016 Wiley Coyote Roadrunner_Nemesis North_America La_Paz
LAPTOP-MOQUDB6E:/tmp $ sort -k5 members # {"-k5": Col_5 sort doesn't work}
00018 Jiminy Cricket Disney Europe Tuscon0y
00019 Speedy Gonzales Cats_Meow North_America Guadalajara
00017 Jasmine Disney_Princess Asia Desert_Sands
00013 Snow White Disney_Princess Europe Enchanted_Forest
00016 Wiley Coyote Roadrunner_Nemesis North_America La_Paz
LAPTOP-MOQUDB6E:/tmp $ sort -k6 members # {"-k6": anomoly (sort occurs on Col_5)}
00017 Jasmine Disney_Princess Asia Desert_Sands
00013 Snow White Disney_Princess Europe Enchanted_Forest
00018 Jiminy Cricket Disney Europe Tuscon0y
00019 Speedy Gonzales Cats_Meow North_America Guadalajara
00016 Wiley Coyote Roadrunner_Nemesis North_America La_Paz发布于 2020-04-12 12:15:22
区别可能与这两个系统的默认区域设置有关,特别是前导空格的排序权重。
首先,重要的是要注意,在默认情况下,sort在非空白到空白的转换上分隔-不是空白到非空白。因此,当您有一个使用多个空格字符对齐的柱状文件时,这些额外的对齐字符被认为是以下字段的一部分。通过添加--debug标志,您可以看到这如何影响您的结果。
$ LC_COLLATE=C sort --debug -k5 file
sort: using simple byte comparison
sort: leading blanks are significant in key 1; consider also specifying 'b'
00018 Jiminy Cricket Disney Europe Tuscon0y
_____________________________________
________________________________________________________________________
00019 Speedy Gonzales Cats_Meow North_America Guadalajara
_____________________________________
___________________________________________________________________________
00017 Jasmine Disney_Princess Asia Desert_Sands
________________________________
____________________________________________________________________________
00013 Snow White Disney_Princess Europe Enchanted_Forest
____________________________________
________________________________________________________________________________
00016 Wiley Coyote Roadrunner_Nemesis North_America La_Paz
_______________________
______________________________________________________________________在这里您可以看到,实际发生的情况是,它是第5列上的排序,但是,一旦考虑到前面的空格,结果就是它按最前面的空格排序。
同样地,当您使用-k6时,它实际上是在第6列上排序,但巧合的是,这给出了与第5列地理区域上的字母排序相同的顺序(就因为_A_sia恰好是短的和North_America是长的)
$ LC_COLLATE=C sort --debug -k6 file
sort: using simple byte comparison
sort: leading blanks are significant in key 1; consider also specifying 'b'
00017 Jasmine Disney_Princess Asia Desert_Sands
_______________________
____________________________________________________________________________
00013 Snow White Disney_Princess Europe Enchanted_Forest
_________________________
________________________________________________________________________________
00018 Jiminy Cricket Disney Europe Tuscon0y
_________________
________________________________________________________________________
00019 Speedy Gonzales Cats_Meow North_America Guadalajara
_____________
___________________________________________________________________________
00016 Wiley Coyote Roadrunner_Nemesis North_America La_Paz
________
______________________________________________________________________调试输出中给出了最简单的解决方案:
sort: leading blanks are significant in key 1; consider also specifying 'b'即
$ LC_COLLATE=C sort --debug -b -k5 file
sort: using simple byte comparison
00017 Jasmine Disney_Princess Asia Desert_Sands
___________________________
____________________________________________________________________________
00013 Snow White Disney_Princess Europe Enchanted_Forest
_______________________________
________________________________________________________________________________
00018 Jiminy Cricket Disney Europe Tuscon0y
_______________________
________________________________________________________________________
00019 Speedy Gonzales Cats_Meow North_America Guadalajara
__________________________
___________________________________________________________________________
00016 Wiley Coyote Roadrunner_Nemesis North_America La_Paz
_____________________
______________________________________________________________________发布于 2020-04-13 23:55:26
如果6列文件在每一行的字段之间只有一个空格,那么"sort -k#“("#":1-6)在Ubuntu和RHEL上运行,用于对列1-6执行排序。使用“排序-k#,#”的排序也是如此(例如:排序-k1,1 .,排序-k2,2 .排序-k6,6 .)。
如果六列文件在每一行的字段之间有多个字符(ex: file已使用"cat \ column -t >“对齐),则"sort -k#”("#":1-6)仅在RHEL上对列1-6起作用。
对于Ubuntu,我观察到“排序-k #”对列1和列2有效,对第3-5列不做任何操作,而“排序-k6”最后在第5列上进行排序。
对于我的实用程序,我最后对排序输入文件进行了预处理,使其在字段之间只有一个空格,然后在RHEL和Ubuntu上重新测试,以确保我得到了预期的结果。感谢所有关注这两种操作系统类型之间的异常现象的人。
https://askubuntu.com/questions/1226396
复制相似问题