文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在没有日期的情况下在两次之间提取日志条目？

问如何在没有日期的情况下在两次之间提取日志条目？
EN

Stack Overflow用户

提问于 2022-05-03 20:22:06

回答 3查看 54关注 0票数 0

我正在尝试拥有一个自动化脚本，它可以获取最新的日志条目，并从两个小时前收集所有日志条目，而不管在这段时间内是否存在日志条目。我一直在研究的问题是，我发现的所有例子都附有日期，而我没有。一个示例日志输出是：

13:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
13:26:28.713687 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9287:9522, ack 13044, win 420, length 235
13:26:28.713766 IP term-IdeaPad-Flex.46364 > unn-37-19-198-173.datapacket.com.https: Flags [.], ack 9522, win 24576, length 0
13:26:28.840650 IP term-IdeaPad-Flex.46364 > unn-37-19-198-173.datapacket.com.https: Flags [.], seq 14286:15624, ack 9522, win 24576, length 1338
13:26:28.848949 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9522:9599, ack 14286, win 420, length 77
13:26:28.849002 IP term-IdeaPad-Flex.46364 > unn-37-19-198-173.datapacket.com.https: Flags [P.], seq 15624:15674, ack 9599, win 24576, length 50
13:26:28.849023 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9599:9743, ack 14286, win 420, length 144
13:26:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526

所以我的时间在前线，没有约会。date不喜欢使用它，它给了我一个date: invalid date ‘+%s’响应，并且没有输出任何东西。我目前的工作是：

#!/bin/bash

truncate -s 0 twoHour.log

NEW=$(tail -n1 $1 | cut -d ":" -f1)
# echo $NEW
New=$(date -d "$NEW" +%s)
OLD=$(($NEW-2))
New=$(date -d "$OLD" +%s)
# echo $OLD
START=$(egrep "$NEW\:\d\d\:\d\d" $1 | tail | date -d +%s)
END=$(egrep "$OLD\:\d\d\:\d\d" $1 | head | date -d +%s)

while read line; do

    # Extract the date for each line.
    # First strip off everything up to the first "[".
    # Then remove everything after the first "]".
    # Finally, straighten up the format with the cleandate function
    date="${date%%.*}"
    date=$( cleandate "$date" )

    # If the date falls between d1 and d2, print it
    if [[ $date -ge $START && $date -le $END ]]; then
         echo "$line"
    fi

done

新的和旧的是为被提取的时间。开始和结束是两者之间的边界，两者之间的一切都是逐行输出的。$1用于日志文件。

我已经尝试修改bash/awk脚本和搜索任何预先制作的脚本已经有几个小时了，所以我不知道如何让它工作。

bash

回答 3

Stack Overflow用户

回答已采纳

发布于 2022-05-03 23:00:29

sed可用于提取正则表达式的线条。

/^11:.*$/,/^13:26:28.849031 .*$/p

通过获取分钟数并将表达式添加为/^11:(2[6-9]|[3-5][0-9]).*$/,/^13:26:28.849031 .*$/p，可以进一步细化第一个地址。

last_line=$(tail -n1 test.txt)
end_time=$(cut -d ' ' -f1 <<<"$last_line")
end_hour="${end_time:0:2}"
min_msb="${end_time:3:1}"
min_next=$(($min_msb+1))
min_lsb="${end_time:4:1}"
start_hour=$(($end_hour-2))

if [ "$min_msb" -lt 5 ];then
  min_next=$(($min_msb+1))
else
  min_next=5
fi

sed -rn "/^$start_hour:($min_msb[$min_lsb-9]|[$min_next-5][0-9]).*$/,/^$end_time .*$/p" test.txt

如果时间跨度超过24小时

22:57:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
...
00:36:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526

然后

UPDATE：修正了当时间超过午夜时sed第一个地址的正则表达式。

hour_range=2
last_line=$(tail -n1 test.txt)
#end_time=$(cut -d ' ' -f1 <<<"$last_line")
end_time="${last_line:0:8}"

start_time="$(date -d "$(date -d "$end_time" --iso=seconds) -$hour_range hour" '+%T')"
echo "Time range: $start_time - $end_time"

end_hour="$(printf "%d" ${end_time:0:2})"
min_msb="$(printf "%d" ${end_time:3:1})"
min_lsb="$(printf "%d" ${end_time:4:1})"
start_hour="$(printf '%d' ${start_time:0:2})"

if [ "$min_msb" -lt 5 ];then
  min_next=$(($min_msb+1))
else
  min_next=5
fi
# Crossed midnight
start_hour_expr="$start_hour:($min_msb[$min_lsb-9]|[$min_next-5][0-9])"
if [ "$start_hour" -gt "$end_hour" ];then
  start_hour_lsb_next=$((${start_hour:1:1} + 1))
  start_hour_next="${start_hour:0:1}${start_hour_lsb_next}"
  if [ "$start_hour_next" -eq 24 ]; then
     start_hour_next="00"
  fi
  start_hour_expr="($start_hour_expr|$start_hour_next:[0-5][0-9])"
fi

echo "sed expression:"
echo -e "/^$start_hour_expr.*$/,/^$end_time.*$/p \n"

sed -rn "/^$start_hour_expr.*$/,/^$end_time.*$/p" test.txt

给定的

21:32:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
21:57:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
22:10:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:07:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
00:26:28.849023 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9599:9743, ack 14286, win 420, length 144
00:36:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526

sed expression:
/^(22:(3[6-9]|[4-5][0-9])|23:[0-5][0-9]).*$/,/^00:36:28.*$/p 

23:07:46.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
23:26:28.709883 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9136:9287, ack 13044, win 420, length 151
00:26:28.849023 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9599:9743, ack 14286, win 420, length 144
00:36:28.849031 IP unn-37-19-198-173.datapacket.com.https > term-IdeaPad-Flex.46364: Flags [P.], seq 9743:10269, ack 14286, win 420, length 526

票数 0

Stack Overflow用户

发布于 2022-05-04 00:43:41

假设：

偏移量(在OP示例中为2小时)小于24小时
每一行都以格式的时间戳( HH:MM:SS )开头
日志可以跨越多天。

计划：

将偏移量(例如，2 hrs)转换为秒；我们将此称为offset_secs
从最后一行文件中获取时间；我们将调用这个last_time
将时间戳转换为纪元/秒；我们将调用此last_epoch
从offset_secs中减去last_epoch；我们称之为first_epoch
将first_epoch转换回HH:MM:SS字符串；我们将调用此first_time
为了解决文件跨越多个午夜的时间戳，我们将在数组中保存感兴趣的行，当我们发现另一个午夜要去时，重新设置数组
在awk/END处理期间，我们将行数组打印到标准输出。

GNU awk的一个想法是：

$ cat log.awk
BEGIN { FS="." }                                # set input field delimiter to "."

# first line of input is last line of log file; grab time and calculate the offset/start time

NR==1 { last_time   = $1
        last_epoch  = mktime( strftime("%Y %m %d") " " gensub(/:/," ","g",last_time))
        first_epoch = last_epoch - offset_secs
        first_time  = strftime("%H:%M:%S", first_epoch)

        if (first_time > last_time)
           spans_midnight=1
        next
      }

# for the rest of the input lines determine if the time falls within the last "offset_secs"

      { curr_time = $1
        if ( (  spans_midnight && curr_time >= first_time) ||
             (  spans_midnight && curr_time <= last_time)  ||
             ( !spans_midnight && curr_time >= first_time && curr_time <= last_time) )
           lines[++cnt]=$0
        else {                                  # outside the time range so ...
           delete lines                         # delete anything saved up to this point and ...
           cnt=0                                # reset the array index
        }
      }
END   { for (i=1;i<=cnt;i++)                    # print the lines that occurred within the last "offset_secs"
            print lines[i]
      }

注意:有关mktime()和strftime()函数的更多详细信息，请参见GNU awk:时间函数

测试#1:持续2小时；不跨越午夜；文件跨越午夜

$ cat sample.log
22:22:00.896232 IP 104.16.42.63.https  ignore this line
06:22:00.896232 IP 104.16.42.63.https  ignore this line; crossed midnight
07:22:00.896232 IP 104.16.42.63.https  ignore this line
09:23:00.896232 IP 104.16.42.63.https  ignore this line
09:51:49.896232 IP 104.16.42.63.https  ignore this line
09:51:50.896232 IP 104.16.42.63.https  keep this line
10:24:37.896232 IP 104.16.42.63.https  keep this line
11:51:50.896232 IP 104.16.42.63.https  keep this line

$ offset_secs=$((2*60*60))                   # 2 hours

$ awk -v offset_secs="${offset_secs}" -f log.awk <(tail -1 sample.log) sample.log
09:51:50.896232 IP 104.16.42.63.https  keep this line
10:24:37.896232 IP 104.16.42.63.https  keep this line
11:51:50.896232 IP 104.16.42.63.https  keep this line

测试2：持续4小时；跨午夜；文件跨越多个午夜

$ cat sample.log
20:22:00.896232 IP 104.16.42.63.https  ignore this line
23:22:00.896232 IP 104.16.42.63.https  ignore this line
01:22:00.896232 IP 104.16.42.63.https  ignore this line; crossed midnight
23:22:00.896232 IP 104.16.42.63.https  ignore this line
01:22:00.896232 IP 104.16.42.63.https  ignore this line; crossed midnight
06:22:00.896232 IP 104.16.42.63.https  ignore this line
07:22:00.896232 IP 104.16.42.63.https  ignore this line
09:23:00.896232 IP 104.16.42.63.https  ignore this line
22:51:49.896232 IP 104.16.42.63.https  ignore this line
22:51:50.896232 IP 104.16.42.63.https  keep this line
23:07:37.896232 IP 104.16.42.63.https  keep this line
00:51:50.896232 IP 104.16.42.63.https  keep this line; crossed midnight
01:24:37.896232 IP 104.16.42.63.https  keep this line
02:51:50.896232 IP 104.16.42.63.https  keep this line

$ offset_secs=$((4*60*60))                   # 4 hours

$ awk -v offset_secs="${offset_secs}" -f log.awk <(tail -1 sample.log) sample.log
22:51:50.896232 IP 104.16.42.63.https  keep this line
23:07:37.896232 IP 104.16.42.63.https  keep this line
00:51:50.896232 IP 104.16.42.63.https  keep this line; crossed midnight
01:24:37.896232 IP 104.16.42.63.https  keep this line
02:51:50.896232 IP 104.16.42.63.https  keep this line

票数 1

Stack Overflow用户

发布于 2022-05-04 14:20:33

egrep的正则表达式功能是有限的。您可以使用[0-9]或[[:digit:]]，但不能使用\d。如果您想要\d，可以在grep -P中使用Perl样式的RegEx。

您还可以告诉grep只输出与-o匹配的数据。

值得注意的是，egrep和grep -E是同义词；我建议显式地使用grep -E，但这只是我的首选。

  -E, --extended-regexp     PATTERN is an extended regular expression (ERE)
  -P, --perl-regexp         PATTERN is a Perl regular expression
  -o, --only-matching       show only the part of a line matching PATTERN

对于tail和head，您似乎要为每个行寻找一个单行、第一行和最后一行。默认情况下，它们输出10行。这可以用-n 1来控制。

日期命令失败，因为它不知道从哪个文件读取。您可以指定-f -来指示输入文件是STDIN (管道字符串到GNU日期的转换-如何使它从stdin读取？)

有了这些，下面的内容就能让你上路了。

START=$(egrep -o "$NEW:[0-9]{2}:[0-9]{2}\.[0-9]+" $1 | tail -n 1 | date +%s -f -)
END=$(egrep -o "$OLD:[0-9]{2}:[0-9]{2}\.[0-9]+" $1 | head -n 1| date +%s -f -)

提示:在对bash脚本进行故障排除时使用bash -x可以更好地了解所发生的事情。

[root@91192da89fc4 temp]# bash -x date-orig.sh log
+ truncate -s 0 twoHour.log
++ tail -n1 log
++ cut -d : -f1
+ NEW=13
++ date -d 13 +%s
+ New=1651582800
+ OLD=11
++ date -d 11 +%s
+ New=1651575600
++ egrep '13\:\d\d\:\d\d' log
++ tail
++ date -d +%s
date: invalid date '+%s'
+ START=
++ egrep '11\:\d\d\:\d\d' log
++ head
++ date -d +%s
date: invalid date '+%s'
+ END=
+ read line

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/72105051

复制

相似问题

问如何在没有日期的情况下在两次之间提取日志条目？
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在没有日期的情况下在两次之间提取日志条目？EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在没有日期的情况下在两次之间提取日志条目？
EN