首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从冒号分隔的键值字符串创建字典

从冒号分隔的键值字符串创建字典
EN

Stack Overflow用户
提问于 2021-12-01 03:27:07
回答 1查看 319关注 0票数 4

尝试从给定的字符串创建字典,该字符串的格式可以是

代码语言:javascript
复制
key1:value1 key2:value2

然而,选择价值是一个问题,因为有时它可能有。

key1: value1

  • quotes key1: "value has space"

  1. 空白

密钥的标识符是something:

试在下面

代码语言:javascript
复制
def tokenize(msg):
    legit_args = [i for i in msg if ":" in i]
    print(legit_args)
    dline = dict(item.split(":") for item in legit_args)
    return dline

以上仅适用于无空格值。

然后试着在下面

代码语言:javascript
复制
def tokenize2(msg):
    try:
        #return {k: v for k, v in re.findall(r'(?=\S|^)(.+?): (\S+)', msg)}
        return dict(token.split(':') for token in shlex.split(msg))
    except:
        return {}

这在key:"something given like this"中很好,但仍然需要一些更改才能工作,下面是问题所在

代码语言:javascript
复制
>>> msg = 'key1: "this is value1 "   key2:this is value2 key3: this is value3'
>>> import shlex
>>> dict(token.split(':') for token in shlex.split(msg))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: dictionary update sequence element #1 has length 1; 2 is required
>>> shlex.split(msg)  # problem is here i think
['key1:', 'this is value1 ', 'key2:this', 'is', 'value2', 'key3:', 'this', 'is', 'value3']
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-12-01 04:05:58

你能试试这样的东西吗?

代码语言:javascript
复制
import re

s = "key1: \"this is value1 \"   key2:this is value2 key3: this is value3"
d = {}
for m in re.findall(r'\w+:\s*(?:\w+(?:\s+\w+)*(?=\s|$)|"[^"]+")', s):
    key, val = re.split(r':\s*', m)
    d[key] = val.strip('"')
print(d)

输出:

代码语言:javascript
复制
{'key3': 'this is value3', 'key2': 'this is value2', 'key1': 'this is value1 '}

对regex的解释:

  • \w+:\s*匹配一个由冒号后跟的单词,并且可能(零或多个) whitespaces.
  • (?: ... )组成一个或多个不捕获的group.
  • :\w+(?:\s+\w+)*(?=\s|$)匹配一个或多个单词,后面跟着空格或字符串的结尾。
  • 管道字符|替换正则表达式pattern.
  • "[^"]+"匹配由双引号括起来的字符串。

编辑

如果您想处理fancy quotes (也称为卷引号或智能引号),请尝试:

代码语言:javascript
复制
#!/usr/bin/python
# -*- coding: utf-8 -*-

import re

s = "key1: \"this is value1 \"   key2:this is value2 key3: this is value3 title: “incorrect title” title2: “incorrect title2” key4:10.20.30.40"
d = {}
for m in re.findall(r'\w+:\s*(?:[\w.]+(?:\s+[\w.]+)*(?=\s|$)|"[^"]+"|“.+?”)', s):
    key, val = re.split(r':\s*', m)
    d[key] = val.replace('“', '"').replace('”', '"').strip('"')
print(d)

输出:

代码语言:javascript
复制
{'title': 'incorrect title', 'key3': 'this is value3', 'key2': 'this is value2', 'key1': 'this is value1 ', 'key4': '10.20.30.40', 'title2': 'incorrect title2'}

Edit2

下面的代码现在允许在值中使用冒号:

代码语言:javascript
复制
#!/usr/bin/python
# -*- coding: utf-8 -*-

import re

s = "key1: \"this is value1 \"   key2:this is value2 key3: this is value3 title: “incorrect title” title2: “incorrect title2” key4:10.20.30.40 key5:\"value having:colon\""
d = {}
for m in re.findall(r'\w+:\s*(?:[\w.]+(?:\s+[\w.]+)*(?=\s|$)|"[^"]+"|“.+?”)', s):
    key, val = re.split(r':\s*', m, 1)
    d[key] = val.replace('“', '"').replace('”', '"').strip('"')
print(d)

输出:

代码语言:javascript
复制
{'title': 'incorrect title', 'key3': 'this is value3', 'key2': 'this is value2', 'key1': 'this is value1 ', 'key5': 'value having:colon', 'key4': '10.20.30.40', 'title2': 'incorrect title2'}

这一修改适用于以下内容:

代码语言:javascript
复制
key, val = re.split(r':\s*', m, 1)

将第三个参数1添加为maxsplit,以限制拆分的最大计数。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70178788

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档