一般是,我的问题是如何使用删除文件中重复的行,这里的“复制”包括某些列是可交换的的情况。
我的问题的背景。最初,我有这样一个文件:
10/13-01:55:42.549318 [**] [1:1000003:0] Detect possible CnC comu [**] [Classification: Misc activity] [Priority: 3] {TCP} 10.0.0.3:1045 -> 103.105.0.1:80
10/13-01:55:42.549318 [**] [1:1000003:0] Detect possible CnC comu [**] [Classification: Misc activity] [Priority: 3] {TCP} 103.105.0.1:80 -> 10.0.0.3:1045
10/13-01:56:45.221877 [**] [1:1000003:0] Detect possible CnC comu [**] [Classification: Misc activity] [Priority: 3] {TCP} 10.0.0.3:1049 -> 103.105.0.1:80
10/13-01:56:57.150985 [**] [1:1000003:0] Detect possible CnC comu [**] [Classification: Misc activity] [Priority: 3] {TCP} 10.0.0.3:1051 -> 103.105.0.1:80
10/13-01:56:58.935176 [**] [1:1000003:0] Detect possible CnC comu [**] [Classification: Misc activity] [Priority: 3] {TCP} 10.0.0.3:1051 -> 103.105.0.1:80
10/13-01:57:13.494148 [**] [1:1000003:0] Detect possible CnC comu [**] [Classification: Misc activity] [Priority: 3] {TCP} 10.0.0.3:1054 -> 103.105.0.1:80我的目标是达到以下格式化文件:
10.0.0.3|1045|103.105.0.1|80|CnC
10.0.0.3|1049|103.105.0.1|80|CnC
10.0.0.3|1051|103.105.0.1|80|CnC
10.0.0.3|1054|103.105.0.1|80|CnC到目前为止,的努力和进步----我使用了以下(编写得非常糟糕)来处理它:
cat test.log | awk -F" " '{print $6 " " $15 " " $17}' | awk '{t = $1; $1 = $2; $2 = $3; $3 = t; print;}' | awk '{gsub(":", "| "); gsub(" ","|"); print}' | awk 'NR%2!=0'然后,我有一个包含以下示例的文件:
10.0.0.3|1045|103.105.0.1|80|CnC
10.0.0.3|1049|103.105.0.1|80|CnC
10.0.0.3|1051|103.105.0.1|80|CnC
10.0.0.3|1051|103.105.0.1|80|CnC
10.0.0.3|1054|103.105.0.1|80|CnC
103.105.0.1|80|10.0.0.3|1045|CnC第一行和最后一行被认为是复制的,因为它们匹配以下模式
A|a|B|b|M
B|b|A|a|M寻求帮助,,我想知道是否有使用AWK,我可以在一个相对较大的文件中删除这样重复的行,而不需要我的后处理?谢谢!
发布于 2017-03-11 12:36:12
也许您可以跳过这一步,只处理原始数据:
#!/usr/bin/awk -f
BEGIN{ OFS = "|" }
{
ip1 = $(NF-2)
ip2 = $NF
}
!(key1[ip1,ip2] + key1[ip2,ip1]){
split(ip1,combo1,":")
split(ip2,combo2,":")
key1[ip1,ip2]++
key1[ip2,ip1]++
print combo1[1],combo1[2],combo2[1],combo2[2],$6
}https://stackoverflow.com/questions/42729385
复制相似问题