尝试从给定的字符串创建字典,该字符串的格式可以是
key1:value1 key2:value2然而,选择价值是一个问题,因为有时它可能有。
key1: value1
key1: "value has space"
密钥的标识符是something:
试在下面
def tokenize(msg):
legit_args = [i for i in msg if ":" in i]
print(legit_args)
dline = dict(item.split(":") for item in legit_args)
return dline以上仅适用于无空格值。
然后试着在下面
def tokenize2(msg):
try:
#return {k: v for k, v in re.findall(r'(?=\S|^)(.+?): (\S+)', msg)}
return dict(token.split(':') for token in shlex.split(msg))
except:
return {}这在key:"something given like this"中很好,但仍然需要一些更改才能工作,下面是问题所在
>>> msg = 'key1: "this is value1 " key2:this is value2 key3: this is value3'
>>> import shlex
>>> dict(token.split(':') for token in shlex.split(msg))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: dictionary update sequence element #1 has length 1; 2 is required
>>> shlex.split(msg) # problem is here i think
['key1:', 'this is value1 ', 'key2:this', 'is', 'value2', 'key3:', 'this', 'is', 'value3']发布于 2021-12-01 04:05:58
你能试试这样的东西吗?
import re
s = "key1: \"this is value1 \" key2:this is value2 key3: this is value3"
d = {}
for m in re.findall(r'\w+:\s*(?:\w+(?:\s+\w+)*(?=\s|$)|"[^"]+")', s):
key, val = re.split(r':\s*', m)
d[key] = val.strip('"')
print(d)输出:
{'key3': 'this is value3', 'key2': 'this is value2', 'key1': 'this is value1 '}对regex的解释:
\w+:\s*匹配一个由冒号后跟的单词,并且可能(零或多个) whitespaces.(?: ... )组成一个或多个不捕获的group.:\w+(?:\s+\w+)*(?=\s|$)匹配一个或多个单词,后面跟着空格或字符串的结尾。|替换正则表达式pattern."[^"]+"匹配由双引号括起来的字符串。编辑
如果您想处理fancy quotes (也称为卷引号或智能引号),请尝试:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import re
s = "key1: \"this is value1 \" key2:this is value2 key3: this is value3 title: “incorrect title” title2: “incorrect title2” key4:10.20.30.40"
d = {}
for m in re.findall(r'\w+:\s*(?:[\w.]+(?:\s+[\w.]+)*(?=\s|$)|"[^"]+"|“.+?”)', s):
key, val = re.split(r':\s*', m)
d[key] = val.replace('“', '"').replace('”', '"').strip('"')
print(d)输出:
{'title': 'incorrect title', 'key3': 'this is value3', 'key2': 'this is value2', 'key1': 'this is value1 ', 'key4': '10.20.30.40', 'title2': 'incorrect title2'}Edit2
下面的代码现在允许在值中使用冒号:
#!/usr/bin/python
# -*- coding: utf-8 -*-
import re
s = "key1: \"this is value1 \" key2:this is value2 key3: this is value3 title: “incorrect title” title2: “incorrect title2” key4:10.20.30.40 key5:\"value having:colon\""
d = {}
for m in re.findall(r'\w+:\s*(?:[\w.]+(?:\s+[\w.]+)*(?=\s|$)|"[^"]+"|“.+?”)', s):
key, val = re.split(r':\s*', m, 1)
d[key] = val.replace('“', '"').replace('”', '"').strip('"')
print(d)输出:
{'title': 'incorrect title', 'key3': 'this is value3', 'key2': 'this is value2', 'key1': 'this is value1 ', 'key5': 'value having:colon', 'key4': '10.20.30.40', 'title2': 'incorrect title2'}这一修改适用于以下内容:
key, val = re.split(r':\s*', m, 1)将第三个参数1添加为maxsplit,以限制拆分的最大计数。
https://stackoverflow.com/questions/70178788
复制相似问题