我正在寻找一种简单的方法来将IMAP响应中的带括号列表拆分成Python列表或元组。我想从
'(BODYSTRUCTURE ("text" "plain" ("charset" "ISO-8859-1") NIL NIL "quoted-printable" 1207 50 NIL NIL NIL NIL))'至
(BODYSTRUCTURE, ("text", "plain", ("charset", "ISO-8859-1"), None, None, "quoted-printable", 1207, 50, None, None, None, None))发布于 2010-10-20 13:43:13
pyparsing的nestedExpr解析器函数默认解析嵌套括号:
from pyparsing import nestedExpr
text = '(BODYSTRUCTURE ("text" "plain" ("charset" "ISO-8859-1") NIL NIL "quotedprintable" 1207 50 NIL NIL NIL NIL))'
print nestedExpr().parseString(text)打印:
[['BODYSTRUCTURE', ['"text"', '"plain"', ['"charset"', '"ISO-8859-1"'], 'NIL', 'NIL', '"quoted printable"', '1207', '50', 'NIL', 'NIL', 'NIL', 'NIL']]]下面是一个稍微修改过的解析器,它在解析时将整数字符串转换为整数,从"NIL“转换为None,并从引用的字符串中去掉引号:
from pyparsing import (nestedExpr, Literal, Word, alphanums,
quotedString, replaceWith, nums, removeQuotes)
NIL = Literal("NIL").setParseAction(replaceWith(None))
integer = Word(nums).setParseAction(lambda t:int(t[0]))
quotedString.setParseAction(removeQuotes)
content = (NIL | integer | Word(alphanums))
print nestedExpr(content=content, ignoreExpr=quotedString).parseString(text)打印:
[['BODYSTRUCTURE', ['text', 'plain', ['charset', 'ISO-8859-1'], None, None, 'quoted-printable', 1207, 50, None, None, None, None]]]发布于 2010-10-20 08:51:07
事实上,存在嵌套的元组使得使用正则表达式不可能做到这一点。您必须编写一个解析器来指示您是否在括号内。
你可以试一试
tuple('(BODYSTRUCTURE ("text" "plain" ("charset" "ISO-8859-1") NIL NIL "quoted-printable" 1207 50 NIL NIL NIL NIL))'.replace("NIL", "None").split(' '))编辑:嗯,我得到了一些适合你的例子,但不确定这是你想要的。
需要在某个地方定义BODYSTRUCTURE。
eval(",".join([a for a in '(BODYSTRUCTURE ("text" "plain" ("charset" "ISO-8859-1") NIL NIL "quoted-printable" 1207 50 NIL NIL NIL NIL))'.replace("NIL", "None").split(' ')]))
发布于 2011-09-29 19:34:24
仅取出实际包含主体结构的服务器答案的内部部分:
struct = ('(((("TEXT" "PLAIN" ("CHARSET" "ISO-8859-1") NIL NIL "7BIT" 16 2)'
'("TEXT" "HTML" ("CHARSET" "ISO-8859-1") NIL NIL "QUOTED-PRINTABLE"'
' 392 6) "ALTERNATIVE")("IMAGE" "GIF" ("NAME" "538.gif") '
'"<538@goomoji.gmail>" NIL "BASE64" 172)("IMAGE" "PNG" ("NAME" '
'"4F4.png") "<gtalk.4F4@goomoji.gmail>" NIL "BASE64" 754) "RELATED")'
'("IMAGE" "JPEG" ("NAME" "avatar_airbender.jpg") NIL NIL "BASE64"'
' 157924) "MIXED")')下一步是替换一些标记,准备字符串转换为python类型:
struct = struct.replace(' ', ',').replace(')(', '),(')使用内置模块compiler来解析我们的结构:
import compiler
expr = compiler.parse(struct.replace(' ', ',').replace(')(', '),('), 'eval')执行简单的递归函数来转换表达式:
def transform(expression):
if isinstance(expression, compiler.transformer.Expression):
return transform(expression.node)
elif isinstance(expression, compiler.transformer.Tuple):
return tuple(transform(item) for item in expression.nodes)
elif isinstance(expression, compiler.transformer.Const):
return expression.value
elif isinstance(expression, compiler.transformer.Name):
return None if expression.name == 'NIL' else expression.name最后,我们得到了预期的结果,即嵌套的python元组:
result = transform(expr)
print result
(((('TEXT', 'PLAIN', ('CHARSET', 'ISO-8859-1'), None, None, '7BIT', 16, 2), ('TEXT', 'HTML', ('CHARSET', 'ISO-8859-1'), None, None, 'QUOTED-PRINTABLE', 392, 6), 'ALTERNATIVE'), ('IMAGE', 'GIF', ('NAME', '538.gif'), '<538@goomoji.gmail>', None, 'BASE64', 172), ('IMAGE', 'PNG', ('NAME', '4F4.png'), '<gtalk.4F4@goomoji.gmail>', None, 'BASE64', 754), 'RELATED'), ('IMAGE', 'JPEG', ('NAME', 'avatar_airbender.jpg'), None, None, 'BASE64', 157924), 'MIXED')从那里我们可以识别主体结构的不同标头:
text, attachments = (result[0], result[1:])https://stackoverflow.com/questions/3973963
复制相似问题