文章/答案/技术大牛

发布

社区首页 >问答首页 >计数字符串的实例，不包括时间戳间隔<1分钟的时间

问计数字符串的实例，不包括时间戳间隔<1分钟的时间
EN

Stack Overflow用户

提问于 2018-07-12 14:57:07

回答 2查看 74关注 0票数 2

我是编码初学者。我有一个脚本，它将grep用于文件中的特定单词。我需要数一数它的发生，但我正在寻找的词是，级联。因此，如果在不到1分钟内再次重复，我想忽略事件的发生。但是，如果发生的设备与第一次发生的不同，则不应忽略它。

例如: file1.txt

file .txt    2018.09.06 21:27:45.001 There is a error 12345
file .txt    2018.09.06 21:27:45.009 error 12345 is reported on device-1
file .txt    2018.09.06 21:27:45.500 There is a error 12345
file .txt    2018.09.06 21:27:45.601 error 12345 is reported on device-1
file .txt    2018.09.06 21:27:46.899 There is a error 12345
file .txt    2018.09.06 21:27:46.905 error 12345 is reported on device-1
file .txt    2018.09.06 21:27:49.203 There is a error 12345
file .txt    2018.09.06 21:27:49.491 error 12345 is reported on device-6
file .txt    2018.09.06 21:27:52.703 There is a error 12345
file .txt    2018.09.06 21:29:52.991 error 12345 is reported on device-6

结果是

grep -c 12345 file1.txt
10

结果我得到= 10

结果我需要=3

如何忽略基于时间戳的重复事件。

bash

shell

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-07-12 15:19:51

你有多在乎“在1分钟内相隔”的部分？如果说“忽略一分钟内发生的多次事件”就足够了，这是相当简单的。

首先，获取所有“error xyz被报告”行的列表

$ grep "error 12345 is reported" tfile.txt
file .txt    2018.09.06 21:27:45.009 error 12345 is reported on device-1
file .txt    2018.09.06 21:27:45.601 error 12345 is reported on device-1
file .txt    2018.09.06 21:27:46.905 error 12345 is reported on device-1
file .txt    2018.09.06 21:27:49.491 error 12345 is reported on device-6
file .txt    2018.09.06 21:29:52.991 error 12345 is reported on device-6

然后将使用sed的次数减少到HH:MM device-number格式。

$ grep reported tfile.txt | sed 's/.*\(..:..\):.*reported on \(.*\)/\1 \2/'
21:27 device-1
21:27 device-1
21:27 device-1
21:27 device-6
21:29 device-6

然后找到唯一的条目

$ grep reported tfile.txt | sed 's/.*\(..:..\):.*reported on \(.*\)/\1 \2/' | uniq
21:27 device-1
21:27 device-6
21:29 device-6

最后数一数

$ grep reported tfile.txt | sed 's/.*\(..:..\):.*reported on \(.*\)/\1 \2/' | uniq | wc -l
3

票数 2

Stack Overflow用户

发布于 2018-07-12 15:08:05

您需要解析时间戳，做一些日期时间计算(总是有些困难，但至少您可能没有跨越时区，尽管您可能定期跨越夏令节约时间，而且我认为您没有足够的信息来判断何时发生这种情况)。这可能意味着一个简单的grep是不够的，您需要阅读bash中的每一行，解析它，并跟踪它。

然后，您需要像sed或awk这样的东西来解析行。不管怎么说，只要你这么做，你可能就想在整个过程中使用awk。我从未使用awk来管理时间戳，尽管我看到了它的手册页，所以它应该能够处理这个问题。其余的则是根据时间戳跟踪设备名，awk可以很好地管理这一点。

不过，我建议，在这方面，应该采用更高层次的语言.不管是perl、python、ruby，它们都应该能够相当容易地处理这一问题。

票数 -1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/51308840

复制

相似问题

问计数字符串的实例，不包括时间戳间隔<1分钟的时间
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问计数字符串的实例，不包括时间戳间隔<1分钟的时间EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问计数字符串的实例，不包括时间戳间隔<1分钟的时间
EN