文章/答案/技术大牛

发布

社区首页 >问答首页 >从冒号分隔的键值字符串创建字典

问从冒号分隔的键值字符串创建字典
EN

Stack Overflow用户

提问于 2021-12-01 03:27:07

回答 1查看 319关注 0票数 4

尝试从给定的字符串创建字典，该字符串的格式可以是

key1:value1 key2:value2

然而，选择价值是一个问题，因为有时它可能有。

key1: value1

quotes key1: "value has space"

空白

密钥的标识符是something:

试在下面

def tokenize(msg):
    legit_args = [i for i in msg if ":" in i]
    print(legit_args)
    dline = dict(item.split(":") for item in legit_args)
    return dline

以上仅适用于无空格值。

然后试着在下面

def tokenize2(msg):
    try:
        #return {k: v for k, v in re.findall(r'(?=\S|^)(.+?): (\S+)', msg)}
        return dict(token.split(':') for token in shlex.split(msg))
    except:
        return {}

这在key:"something given like this"中很好，但仍然需要一些更改才能工作，下面是问题所在

>>> msg = 'key1: "this is value1 "   key2:this is value2 key3: this is value3'
>>> import shlex
>>> dict(token.split(':') for token in shlex.split(msg))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: dictionary update sequence element #1 has length 1; 2 is required
>>> shlex.split(msg)  # problem is here i think
['key1:', 'this is value1 ', 'key2:this', 'is', 'value2', 'key3:', 'this', 'is', 'value3']

python

regex

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-12-01 04:05:58

你能试试这样的东西吗？

import re

s = "key1: \"this is value1 \"   key2:this is value2 key3: this is value3"
d = {}
for m in re.findall(r'\w+:\s*(?:\w+(?:\s+\w+)*(?=\s|$)|"[^"]+")', s):
    key, val = re.split(r':\s*', m)
    d[key] = val.strip('"')
print(d)

输出：

{'key3': 'this is value3', 'key2': 'this is value2', 'key1': 'this is value1 '}

对regex的解释：

\w+:\s*匹配一个由冒号后跟的单词，并且可能(零或多个) whitespaces.
(?: ... )组成一个或多个不捕获的group.
:\w+(?:\s+\w+)*(?=\s|$)匹配一个或多个单词，后面跟着空格或字符串的结尾。
管道字符|替换正则表达式pattern.
"[^"]+"匹配由双引号括起来的字符串。

编辑

如果您想处理fancy quotes (也称为卷引号或智能引号)，请尝试：

#!/usr/bin/python
# -*- coding: utf-8 -*-

import re

s = "key1: \"this is value1 \"   key2:this is value2 key3: this is value3 title: “incorrect title” title2: “incorrect title2” key4:10.20.30.40"
d = {}
for m in re.findall(r'\w+:\s*(?:[\w.]+(?:\s+[\w.]+)*(?=\s|$)|"[^"]+"|“.+?”)', s):
    key, val = re.split(r':\s*', m)
    d[key] = val.replace('“', '"').replace('”', '"').strip('"')
print(d)

输出：

{'title': 'incorrect title', 'key3': 'this is value3', 'key2': 'this is value2', 'key1': 'this is value1 ', 'key4': '10.20.30.40', 'title2': 'incorrect title2'}

Edit2

下面的代码现在允许在值中使用冒号：

#!/usr/bin/python
# -*- coding: utf-8 -*-

import re

s = "key1: \"this is value1 \"   key2:this is value2 key3: this is value3 title: “incorrect title” title2: “incorrect title2” key4:10.20.30.40 key5:\"value having:colon\""
d = {}
for m in re.findall(r'\w+:\s*(?:[\w.]+(?:\s+[\w.]+)*(?=\s|$)|"[^"]+"|“.+?”)', s):
    key, val = re.split(r':\s*', m, 1)
    d[key] = val.replace('“', '"').replace('”', '"').strip('"')
print(d)

输出：

{'title': 'incorrect title', 'key3': 'this is value3', 'key2': 'this is value2', 'key1': 'this is value1 ', 'key5': 'value having:colon', 'key4': '10.20.30.40', 'title2': 'incorrect title2'}

这一修改适用于以下内容：

key, val = re.split(r':\s*', m, 1)

将第三个参数1添加为maxsplit，以限制拆分的最大计数。

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/70178788

复制

相似问题

问从冒号分隔的键值字符串创建字典
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从冒号分隔的键值字符串创建字典EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从冒号分隔的键值字符串创建字典
EN