我正在尝试切换到只使用python。我在C#中使用得非常广泛的是LINQ。在这个练习中,目标是获得键值对的集合,键是每个月,值是该月的消息数量,我如何使用python做这样的事情,或者可能是更好的方法呢?
class MainClass
{
public static void Main (string[] args)
{
string[] months = { "jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec" };
var log = LineReader ();
Dictionary<string, int> cumulativeMonths = new Dictionary<string, int> ();
months.ToList ()
.ForEach (f => {
cumulativeMonths.Add(f, log.GroupBy(g => g.Split(' ').First().ToLower())
.Where(w => w.Key == f).ToList().Count());
});
}
public static IEnumerable<string> LineReader()
{
Console.WriteLine ("Hello World!");
using (StreamReader sr = new StreamReader (File.OpenRead ("/var/log/messages"))) {
while (!sr.EndOfStream) {
yield return sr.ReadLine ();
}
}
}
}测试输入:
Feb 18 02:51:36 laptop rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="2952" x-info="http://www.rsyslog.com"] start
Feb 18 02:51:36 laptop kernel: Adaptec aacraid driver 1.2-0[30300]-ms
Feb 18 02:51:36 laptop kernel: megaraid cmm: 2.20.2.7 (Release Date: Sun Jul 16 00:01:03 EST 2006)
Feb 18 02:51:36 laptop kernel: megaraid: 2.20.5.1 (Release Date: Thu Nov 16 15:32:35 EST 2006)
Feb 18 02:51:36 laptop kernel: megasas: 06.805.06.00-rc1 Thu. Sep. 4 17:00:00 PDT 2014
Feb 18 02:51:36 laptop kernel: qla2xxx [0000:00:00.0]-0005: : QLogic Fibre Channel HBA Driver: 8.07.00.16-k.
Feb 18 02:51:36 laptop kernel: Emulex LightPulse Fibre Channel SCSI driver 10.4.8000.0.
Feb 18 02:51:36 laptop kernel: Copyright(c) 2004-2014 Emulex. All rights reserved.
Feb 18 02:51:36 laptop kernel: aic94xx: Adaptec aic94xx SAS/SATA driver version 1.0.3 loaded
Feb 18 02:51:36 laptop kernel: ACPI: bus type USB registered测试输出将是字典:{Jan: 64562,Feb: 38762} ....
发布于 2015-05-09 07:16:35
这比你以前做的要简单,而且在Python中也很容易:
with open('/var/log/messages', 'r') as f:
cumulative_months = {}
for line in f:
key = line.split()[0].lower()
cumulative_months[key] = cumulative_months.get(key, 0) + 1with类似于C#的using,当文件超出作用域时,它会关闭文件。python文件对象可以用作迭代器。它会一次读取并返回一行,直到它到达EOF。(它实际上读取的行略多于一行,参见documentation)。
或者,正如m.wasowski所指出的,您可以将collections.Counter类用于这种类型的任务,以使事情变得更容易、更快。
发布于 2015-05-09 07:14:37
您可以使用collections.Counter字典:
from collections import Counter
with open('yourfile') as f:
count = Counter (line.split()[0] for line in f)对于任何错误,我深表歉意,这是在手机上写的:)
发布于 2015-05-09 07:29:00
是的,这就是我自己想出来的,我想知道是否有更优雅的(一行的)方法来解决这个问题:
fh = open("/var/log/messages", encoding = "ISO-8859-1")
fh.seek(0)
febMessages = [x for x in fh if x.split(' ')[0].lower() == 'feb']
len(febMessages)https://stackoverflow.com/questions/30134052
复制相似问题