我试图将我的数据转换成一个字典列表,比如
example_dict = {"host":"146.204.224.152",
"user_name":"feest6811", #note: sometimes the user name is missing! In this case, use '-' as the value for the username.**)
"time":"21/Jun/2019:15:45:24 -0700",
"request":"POST /incentivize HTTP/1.1"} #note: not everything is a POST我的数据:
86.187.99.249 - tillman6650 [21/Jun/2019:15:46:03 -0700] "POST /efficient/unleash HTTP/1.1" 405 22390
76.72.133.93 - carroll1056 [21/Jun/2019:15:46:05 -0700] "POST /morph/optimize/plug-and-play HTTP/2.0" 400 27172
73.162.151.229 - dubuque3528 [21/Jun/2019:15:46:08 -0700] "DELETE /transition/holistic/e-business HTTP/2.0" 301 13923
13.112.8.80 - rau5026 [21/Jun/2019:15:46:09 -0700] "HEAD /ubiquitous/transparent HTTP/1.1" 200 16928
159.253.153.40 - - [21/Jun/2019:15:46:10 -0700] "POST /e-business HTTP/1.0" 504 19845
136.195.158.6 - feeney9464 [21/Jun/2019:15:46:11 -0700] "HEAD /open-source/markets HTTP/2.0" 204 21149
219.194.113.255 - - [21/Jun/2019:15:46:12 -0700] "PATCH /next-generation/niches/mindshare HTTP/1.0" 503 20246
59.101.239.174 - brekke3293 [21/Jun/2019:15:46:13 -0700] "DELETE /ubiquitous/seize/web-enabled HTTP/2.0" 302 14017我的代码:
pattern = """
(?P<host>.*) #User host
(-\ ) #Separator
(?P<user_name>\w*) #User name
(\ \[) #Separator for pharanteses and space
(?P<time>\S*\ -0700) #time
(\]\ ) #Separator for pharanteses and space
(?P<request>.*")
"""
for user in re.finditer(pattern,logdata,re.VERBOSE):
print(user.groupdict())输出:
{'host': '86.187.99.249 ', 'user_name': 'tillman6650', 'time': '21/Jun/2019:15:46:03 -0700', 'request': '"POST /efficient/unleash HTTP/1.1"'}
{'host': '76.72.133.93 ', 'user_name': 'carroll1056', 'time': '21/Jun/2019:15:46:05 -0700', 'request': '"POST /morph/optimize/plug-and-play HTTP/2.0"'}
{'host': '73.162.151.229 ', 'user_name': 'dubuque3528', 'time': '21/Jun/2019:15:46:08 -0700', 'request': '"DELETE /transition/holistic/e-business HTTP/2.0"'}
{'host': '13.112.8.80 ', 'user_name': 'rau5026', 'time': '21/Jun/2019:15:46:09 -0700', 'request': '"HEAD /ubiquitous/transparent HTTP/1.1"'}
{'host': '136.195.158.6 ', 'user_name': 'feeney9464', 'time': '21/Jun/2019:15:46:11 -0700', 'request': '"HEAD /open-source/markets HTTP/2.0"'}
{'host': '59.101.239.174 ', 'user_name': 'brekke3293', 'time': '21/Jun/2019:15:46:13 -0700', 'request': '"DELETE /ubiquitous/seize/web-enabled HTTP/2.0"'}在给定的数据中,一些用户名是“-”,在我的代码中,它只是跳过了这些行。我也必须添加这些行,并使用'-‘作为用户名的值。
发布于 2021-02-17 00:43:39
可以将当前的username正则表达式更改为
(?P<user_name>[\w\-]*)由于-表示法在regex中有特殊的意义(它指示匹配从0到9的任何数字的范围),所以您需要使用\转义它。
https://stackoverflow.com/questions/66234188
复制相似问题