我想分析一个日志文件。它有几个操作,每个操作都包含一组子操作。我想提取按操作分组的子操作的数量。这在sql中很容易,但我却陷入了bash。
以下是该文件的简化版本:
[21:30:21.538Z #a9a.012 DEBUG - - ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4759-9-13-41; Tasks: [ingestion-4759-9-13-41.1.43, ingestion-4759-9-13-41.1.44, ingestion-4759-9-13-41.1.41]
otherlogs stuff ...
[21:31:21.538Z #a9a.012 DEBUG - - ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4757-10-17-4; Tasks: [ingestion-4757-10-17-4.1.2, ingestion-4757-10-17-4.1.1, ingestion-4757-10-17-4.1.3, ingestion-4757-10-17-4.1.4]
otherlogs stuff ...
[21:31:21.690Z #a9a.012 DEBUG - - ] c.h.c.w.j.JobTrackingWorkerReporter: Reporting bulk completion: Partition: tenant-xla; Job: ingestion-4757-10-18-3; Tasks: [ingestion-4757-10-18-3.1.137, ingestion-4757-10-18-3.1.139, ingestion-4757-10-18-3.1.138, ingestion-4757-10-18-3.1.140, ingestion-4757-10-18-3.1.136, ingestion-4757-10-18-3.1.141]每个操作都是点之前的部分,其余的属于任何子操作。
我正在寻找像下面这样的结果,例如,我可以将其存储在一个文件中:
operationName suboperationCount
ingestion-4757-10-18-3 3
ingestion-4757-10-18-4 4
ingestion-4757-10-18-3 6我一直在尝试像cat xlogs.txt | grep 'ingestion' | uniq | wc -w > fileresult.txt这样的组合
但这只会返回全球数字。
谢谢!
发布于 2020-10-19 10:40:44
编辑:OP评论后的知道我们只需要在TASKS中包含in,所以在这种情况下您可以尝试遵循,严格考虑到您的Input_file中每一行只有一个TASK字符串。
awk '
{
sub(/.*Tasks/,"Tasks")
while(match($0,/ingestion-[0-9-]+/)){
arr[substr($0,RSTART,RLENGTH)]++
$0=substr($0,RSTART+RLENGTH)
}
}
END{
for(i in arr){
print i,arr[i]
}
}' Input_file使用awk,请您试着用所示的样品进行跟踪、书写和测试。
awk '
{
while(match($0,/ingestion-[0-9-]+/)){
arr[substr($0,RSTART,RLENGTH)]++
$0=substr($0,RSTART+RLENGTH)
}
}
END{
for(i in arr){
print i,arr[i]
}
}' Input_file解释:添加了上面的详细说明。
awk ' ##Starting awk program from here.
{
while(match($0,/ingestion-[0-9-]+/)){ ##Running while loop till match function returns a TRUE result after matching regex init.
arr[substr($0,RSTART,RLENGTH)]++ ##Creating array arr whihc has index as matched regex substring and keep increasing its value by 1 here.
$0=substr($0,RSTART+RLENGTH) ##Now saving rest of the line(after the matched regx above) into current line.
}
}
END{ ##Starting END block of this awk program from here.
for(i in arr){ ##Traversing through arr all elements here.
print i,arr[i] ##printing index of array and value of array with index of i.
}
}' Input_file ##mentioning Input_file name here.发布于 2020-10-19 10:36:38
您可以使用以下grep + uniq命令:
grep -Eo '\bingestion-[0-9-]+' file.log | uniq -c 4 ingestion-4759-9-13-41
5 ingestion-4757-10-17-4
7 ingestion-4757-10-18-3发布于 2020-10-19 10:42:44
$grep -o 'ingestion[\.0-9-]*\.' file | uniq -c
3 ingestion-4759-9-13-41.1.
4 ingestion-4757-10-17-4.1.
6 ingestion-4757-10-18-3.1.https://stackoverflow.com/questions/64425538
复制相似问题