我有来自多个主机的Java应用程序的几个文件(Gigaspaces日志),我需要根据日期/时间值合并这些文件。
因为每个日志文件都已经被排序,所以我需要从每个文件中获取第一条记录到数组中,确定哪一条键具有最小值,将其合并到结果文件中,从同一个文件中获得一条新行&重复。
Record的定义--第一行有一个键,下面的所有行都没有键,例如:
2015-04-05 02:33:42,135 GSC SEVERE [com.gigaspaces.lrmi] - LRMI Transport Protocol caught server exception caused by [/10.0.1.2:46949] client.; Caused by: java.lang.IllegalArgumentException
at java.nio.ByteBuffer.allocate(ByteBuffer.java:311)
at com.gigaspaces.lrmi.SmartByteBufferCache.get(SmartByteBufferCache.java:50)
at com.gigaspaces.lrmi.nio.Reader.readBytesFromChannelNoneBlocking(Reader.java:410)
at com.gigaspaces.lrmi.nio.Reader.readBytesNonBlocking(Reader.java:644)
at com.gigaspaces.lrmi.nio.Reader.bytesToStream(Reader.java:509)
at com.gigaspaces.lrmi.nio.Reader.readRequest(Reader.java:112)
at com.gigaspaces.lrmi.nio.ChannelEntry.readRequest(ChannelEntry.java:121)
at com.gigaspaces.lrmi.nio.Pivot.handleReadRequest(Pivot.java:445)
at com.gigaspaces.lrmi.nio.selector.handler.ReadSelectorThread.handleRead(ReadSelectorThread.java:81)
at com.gigaspaces.lrmi.nio.selector.handler.ReadSelectorThread.handleConnection(ReadSelectorThread.java:45)
at com.gigaspaces.lrmi.nio.selector.handler.AbstractSelectorThread.doSelect(AbstractSelectorThread.java:74)
at com.gigaspaces.lrmi.nio.selector.handler.AbstractSelectorThread.run(AbstractSelectorThread.java:50)
at java.lang.Thread.run(Thread.java:662)理想情况下,结果文件应该包含键、目录/filename.log&记录的其余部分。
问题:
发布于 2015-09-20 01:56:32
码
将以日期字符串开头的所有文件中的所有行读入数组,然后按日期字符串对数组进行排序:
require 'date'
def get_key_rows(*fnames)
fnames.flat_map do |fname|
IO.foreach(fname).with_object([]) do |s, arr|
dt = DateTime.strptime(s[0, 19], '%Y-%m-%d %H:%M:%S') rescue nil
arr << [s[0, 19], fname, s[19..-1].rstrip] if dt
end
end.sort_by(&:first)
end此方法返回一个由三元素数组组成的数组。每个三元素数组对应于其中一个文件中的一个键行,包括日期/时间字符串、文件名和行中跟随日期/时间字符串的部分的其余部分。请注意,不需要在每个文件中对关键字行进行排序。该方法使用:
关于sort_by,请注意,可以根据日期/时间字符串而不是对应的DateTime对象对字符串进行排序,因为日期/时间字符串的形式是'yyyy-mm-dd hh-mm-ss'。
示例
让我们创建一些文件:
IO.write("f0", "2015-04-05 02:33:42,135 more stuff in f0\n" +
"more in f0\n" +
"2015-04-05 04:33:42,135 more stuff in f0\n" +
"even more in f0")
#=> 108
IO.write("f1", "2015-04-04 02:33:42,135 more stuff in f1\n" +
"2015-04-06 02:33:42,135 more stuff in f1\n" +
"more in f1")
#=> 92
IO.write("f2", "something in f2\n" +
"2015-04-05 02:33:43,135 more stuff in f2\n" +
"even more in f2\n" +
"2015-04-04 02:23:42,135 more stuff in f2")
#=> 113
get_key_rows('f0', 'f1', 'f2')
#=> [["2015-04-04 02:23:42", "f2", ",135 more stuff in f2"],
# ["2015-04-04 02:33:42", "f1", ",135 more stuff in f1"],
# ["2015-04-05 02:33:42", "f0", ",135 more stuff in f0"],
# ["2015-04-05 02:33:43", "f2", ",135 more stuff in f2"],
# ["2015-04-05 04:33:42", "f0", ",135 more stuff in f0"],
# ["2015-04-06 02:33:42", "f1", ",135 more stuff in f1"]] https://stackoverflow.com/questions/32673451
复制相似问题