我正在尝试编写一个脚本,该脚本查看行的一部分,执行sort -u或其他操作来查找唯一的匹配项,然后显示输出,并按行的原始顺序排序。换句话说,只有该行的该部分的第一个匹配项才会出现。
我设法使用cut做到了这一点,但我的输出只显示了数据的剪切部分。我怎么做才能让它得到整行代码呢?
这是我到目前为止所得到的:
cut -d, -f6 infile.txt | cut -c4-11 | grep -n . | sort -t: -k2,2 -u | sort -t: -k1n,1 | cut -d: -f2-我知道数据中不会有额外的:或,会破坏这个脚本。但这只输出唯一的数据。我怎样才能得到整条线?我更喜欢远离perl,但是awk还可以(尽管我对它不是很了解)。
示例:
如果输入文件是这样的(注意,ABCDEFGH不是真实的,我把它放在那里只是为了说明我的意思):
A....,....,...........,.....,....,...20130718......,.........,...........,......
B....,....,...........,.....,....,...20130714......,.........,...........,......
C....,....,...........,.....,....,...20130718......,.........,...........,......
D....,....,...........,.....,....,...20130719......,.........,...........,......
E....,....,...........,.....,....,...20130713......,.........,...........,......
F....,....,...........,.....,....,...20130714......,.........,...........,......
G....,....,...........,.....,....,...20130630......,.........,...........,......
H....,....,...........,.....,....,...20130718......,.........,...........,......我的程序输出:
20130718
20130714
20130719
20130713
20130630我想看看:
A....,....,...........,.....,....,...20130718......,.........,...........,......
B....,....,...........,.....,....,...20130714......,.........,...........,......
D....,....,...........,.....,....,...20130719......,.........,...........,......
E....,....,...........,.....,....,...20130713......,.........,...........,......
G....,....,...........,.....,....,...20130630......,.........,...........,......发布于 2013-07-19 05:04:06
是的,awk是你最好的选择。下面是一个神秘的例子:
awk -F, '!seen[substr($6,4,8)]++' infile.txt解释:
options:
-F, set the field separator to ,
condition:
substr($6,4,8) up to 8 characters starting at the fourth character
of the sixth field
seen[...]++ seen is an associative array (dictionary). Increment the
value associated with ..., and return the old value
!seen[...]++ if there was no old value, perform the action
action:
There is no action, only a condition, so the default action is
performed if the test succeeds. The default action is to print
the line. So the line will be printed if the relevant characters of
the sixth field haven't yet been seen.测试:
$ awk -F, '!seen[substr($6,4,8)]++' <<EOF
> A....,....,...........,.....,....,...20130718......,.........,...........,......
> B....,....,...........,.....,....,...20130714......,.........,...........,......
> C....,....,...........,.....,....,...20130718......,.........,...........,......
> D....,....,...........,.....,....,...20130719......,.........,...........,......
> E....,....,...........,.....,....,...20130713......,.........,...........,......
> F....,....,...........,.....,....,...20130714......,.........,...........,......
> G....,....,...........,.....,....,...20130630......,.........,...........,......
> H....,....,...........,.....,....,...20130718......,.........,...........,......
> EOF
A....,....,...........,.....,....,...20130718......,.........,...........,......
B....,....,...........,.....,....,...20130714......,.........,...........,......
D....,....,...........,.....,....,...20130719......,.........,...........,......
E....,....,...........,.....,....,...20130713......,.........,...........,......
G....,....,...........,.....,....,...20130630......,.........,...........,......
$https://stackoverflow.com/questions/17733498
复制相似问题