首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >再忽略几行

再忽略几行
EN

Stack Overflow用户
提问于 2021-02-17 00:36:28
回答 1查看 45关注 0票数 0

我试图将我的数据转换成一个字典列表,比如

代码语言:javascript
复制
example_dict = {"host":"146.204.224.152", 
                "user_name":"feest6811", #note: sometimes the user name is missing! In this case, use '-' as the value for the username.**)
                "time":"21/Jun/2019:15:45:24 -0700",
                "request":"POST /incentivize HTTP/1.1"} #note: not everything is a POST

我的数据:

代码语言:javascript
复制
86.187.99.249 - tillman6650 [21/Jun/2019:15:46:03 -0700] "POST /efficient/unleash HTTP/1.1" 405 22390
76.72.133.93 - carroll1056 [21/Jun/2019:15:46:05 -0700] "POST /morph/optimize/plug-and-play HTTP/2.0" 400 27172
73.162.151.229 - dubuque3528 [21/Jun/2019:15:46:08 -0700] "DELETE /transition/holistic/e-business HTTP/2.0" 301 13923
13.112.8.80 - rau5026 [21/Jun/2019:15:46:09 -0700] "HEAD /ubiquitous/transparent HTTP/1.1" 200 16928
159.253.153.40 - - [21/Jun/2019:15:46:10 -0700] "POST /e-business HTTP/1.0" 504 19845
136.195.158.6 - feeney9464 [21/Jun/2019:15:46:11 -0700] "HEAD /open-source/markets HTTP/2.0" 204 21149
219.194.113.255 - - [21/Jun/2019:15:46:12 -0700] "PATCH /next-generation/niches/mindshare HTTP/1.0" 503 20246
59.101.239.174 - brekke3293 [21/Jun/2019:15:46:13 -0700] "DELETE /ubiquitous/seize/web-enabled HTTP/2.0" 302 14017

我的代码:

代码语言:javascript
复制
pattern = """
(?P<host>.*)           #User host
(-\ )                  #Separator
(?P<user_name>\w*) #User name
(\ \[)                  #Separator for pharanteses and space
(?P<time>\S*\ -0700) #time
(\]\ )                  #Separator for pharanteses and space
(?P<request>.*")
"""
for user in re.finditer(pattern,logdata,re.VERBOSE):
    print(user.groupdict())

输出:

代码语言:javascript
复制
{'host': '86.187.99.249 ', 'user_name': 'tillman6650', 'time': '21/Jun/2019:15:46:03 -0700', 'request': '"POST /efficient/unleash HTTP/1.1"'}
{'host': '76.72.133.93 ', 'user_name': 'carroll1056', 'time': '21/Jun/2019:15:46:05 -0700', 'request': '"POST /morph/optimize/plug-and-play HTTP/2.0"'}
{'host': '73.162.151.229 ', 'user_name': 'dubuque3528', 'time': '21/Jun/2019:15:46:08 -0700', 'request': '"DELETE /transition/holistic/e-business HTTP/2.0"'}
{'host': '13.112.8.80 ', 'user_name': 'rau5026', 'time': '21/Jun/2019:15:46:09 -0700', 'request': '"HEAD /ubiquitous/transparent HTTP/1.1"'}
{'host': '136.195.158.6 ', 'user_name': 'feeney9464', 'time': '21/Jun/2019:15:46:11 -0700', 'request': '"HEAD /open-source/markets HTTP/2.0"'}
{'host': '59.101.239.174 ', 'user_name': 'brekke3293', 'time': '21/Jun/2019:15:46:13 -0700', 'request': '"DELETE /ubiquitous/seize/web-enabled HTTP/2.0"'}

在给定的数据中,一些用户名是“-”,在我的代码中,它只是跳过了这些行。我也必须添加这些行,并使用'-‘作为用户名的值。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-02-17 00:43:39

可以将当前的username正则表达式更改为

代码语言:javascript
复制
(?P<user_name>[\w\-]*)

由于-表示法在regex中有特殊的意义(它指示匹配从0到9的任何数字的范围),所以您需要使用\转义它。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66234188

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档