首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用python的高级解析(多行)

使用python的高级解析(多行)
EN

Stack Overflow用户
提问于 2017-01-29 10:37:22
回答 1查看 166关注 0票数 2

我在python 2.7上遇到了一个解析问题,让我解释一下:

我正在解析来自incapsula API的事件。其目标是使它们在excel表中可读,以便生成统计数据和图表。

在签名字段中,可以读取事件/攻击的类型和数字。这个数字包括攻击次数,所以我决定在“signature=”字段之后将每一行的攻击次数乘以相应的攻击数之和。

就像这张照片:

代码语言:javascript
复制
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3}
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3}
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3}

到目前为止,一切都如期而至,我得到了正确的攻击数。

对于一些罕见的事件,它们是签名字段中的多个值,如以下捕获:

代码语言:javascript
复制
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
 visit_id=324001290181618591, src_country=Ukraine, event_timestamp=1484493309742, src_ip=91.223.133.30, dest_name=www.xxx.com, dest_id=1551642, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
 visit_id=86001060468746692, src_country=Netherlands, event_timestamp=1483867285054, src_ip=178.22.232.53, dest_name=www.yyy.com, dest_id=1551642, signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
 visit_id=86001060468746692, src_country=Netherlands, event_timestamp=1483867285054, src_ip=178.22.232.53, dest_name=www.yyy.com, dest_id=1551642, signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
 visit_id=86001060468746692, src_country=Netherlands, event_timestamp=1483867285054, src_ip=178.22.232.53, dest_name=www.yyy.com, dest_id=1551642, signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
 visit_id=86001060468746692, src_country=Netherlands, event_timestamp=1483867285054, src_ip=178.22.232.53, dest_name=www.yyy.com, dest_id=1551642, signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}

对于那些罕见的行,我仍然得到了正确的攻击计数,但我想从以下几个方面来安排签名字段:

代码语言:javascript
复制
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}

对此:

代码语言:javascript
复制
signature={api.threats.sql_injection}
signature={api.threats.sql_injection}
signature={api.threats.sql_injection}
signature={api.threats.bot_access_control}
signature={api.threats.illegal_resource_access}
signature={api.threats.cross_site_scripting}
signature={api.threats.bot_access_control}
signature={api.threats.illegal_resource_access}
signature={api.threats.illegal_resource_access}
signature={api.threats.illegal_resource_access}

(前六行为第一次重复事件6次(3+1+1+1 =6),后4行为第二次重复事件4次(1+3=4)

我目前的源代码:

代码语言:javascript
复制
#count the number of attack per line
f = open('monthlyLogShort.txt','r')
g = open("count.txt", 'w')
kensu = f.readlines()
f.close()
for line in kensu:
        st = line.find('signature=')
        end = line.find('}')
        unprecise = line[st:end+1]
        #count = int(re.search(r'\d+', unprecise).group())
        count = sum(map(int,re.findall(r'[0-9]+', unprecise)))
        print >> g, count

g.close()

#replicate lines according to the number of attack            
h = open('flog.txt','w')

with open("monthlyLogShort.txt") as textfile1, open("count.txt") as textfile2:
    for x, y in izip(textfile1, textfile2):
        x = x.strip()
        y = y.strip()
        print >> h, x * int(y)
h.close()
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-01-29 22:38:47

如果我正确地读取了您的需求,您将尝试为每次发生的威胁发出一行,同时保留记录的其余部分。此解决方案不直接输出计数,而是将数据转换为每行都有一个威胁。

代码:

代码语言:javascript
复制
sig_str = 'signature={'
for line in kensu:
    record, signature = line.split(sig_str)
    threats = signature.split('}')[0]
    for counts in threats.split(','):
        if '=' in counts:
            threat, count = tuple(counts.split('='))
            for i in range(int(count)):
                print '%s%s%s}' % (record, sig_str, threat.strip())

样本数据:

代码语言:javascript
复制
kensu = [x.strip() for x in """
    record=0, signature={api.threats.sql_injection=1}
    record=1, signature={api.threats.sql_injection=3, api.threats.bot_access_control=1, api.threats.illegal_resource_access=1, api.threats.cross_site_scripting=1,}
    record=2, signature={api.threats.bot_access_control=1, api.threats.illegal_resource_access=3,}
""".split('\n')[1:-1]]

输出:

代码语言:javascript
复制
record=0, signature={api.threats.sql_injection}
record=1, signature={api.threats.sql_injection}
record=1, signature={api.threats.sql_injection}
record=1, signature={api.threats.sql_injection}
record=1, signature={api.threats.bot_access_control}
record=1, signature={api.threats.illegal_resource_access}
record=1, signature={api.threats.cross_site_scripting}
record=2, signature={api.threats.bot_access_control}
record=2, signature={api.threats.illegal_resource_access}
record=2, signature={api.threats.illegal_resource_access}
record=2, signature={api.threats.illegal_resource_access}
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/41919948

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档