文章/答案/技术大牛

发布

社区首页 >问答首页 >Python -解析字符串，已知结构

问Python -解析字符串，已知结构
EN

Stack Overflow用户

提问于 2015-07-05 01:32:19

回答 3查看 1.3K关注 0票数 6

我必须解析一个具有已知结构的简单字符串列表，但我发现它不必要地笨重。我觉得我错过了一个技巧，也许是一些简单的正则表达式，会使这件事变得琐碎吗？

这个字符串指的是将来一定数量的年份/月，我想把它变成十进制年份。

通用格式："aYbM“

A是年数，b是可以是ints的月数，两者都是可选的(以及它们的标识符)。

测试用例：

5Y3M == 5.25
5Y == 5.0
6M == 0.5
10Y11M = 10.91666..
3Y14M = raise ValueError("string '%s' cannot be parsed" %input_string)

到目前为止，我的尝试涉及到字符串分裂，而且非常麻烦，尽管它们确实产生了正确的结果：

def parse_aYbM(maturity_code):
    maturity = 0
    if "Y" in maturity_code:
        maturity += float(maturity_code.split("Y")[0])
        if "M" in maturity_code:
            maturity += float(maturity_code.split("Y")[1].split("M")[0]) / 12
        return maturity
    elif "M" in maturity_code:
        return float(maturity_code[:-1]) / 12
    else:
        return 0

python

regex

string

回答 3

Stack Overflow用户

回答已采纳

发布于 2015-07-05 01:52:50

您可以使用regex模式

(?:(\d+)Y)?(?:(\d+)M)?

这意味着

(?:        start a non-grouping pattern
  (\d+)    match 1-or-more digits, grouped
  Y        followed by a literal Y
)?         end the non-grouping pattern; matched 0-or-1 times
(?:        start another non-grouping pattern
  (\d+)    match 1-or-more digits, grouped
  M        followed by a literal M
)?         end the non-grouping pattern; matched 0-or-1 times

当用在

re.match(r'(?:(\d+)Y)?(?:(\d+)M)?', text).groups()

groups()方法返回分组括号内匹配的部分。如果组不匹配，则返回None。例如,

In [220]: re.match(r'(?:(\d+)Y)?(?:(\d+)M)?', '5Y3M').groups()
Out[220]: ('5', '3')

In [221]: re.match(r'(?:(\d+)Y)?(?:(\d+)M)?', '3M').groups()
Out[221]: (None, '3')

import re
def parse_aYbM(text):
    a, b = re.match(r'(?:(\d+)Y)?(?:(\d+)M)?', text).groups()
    if a == b == None:
        raise ValueError('input does not match aYbM')
    a, b = [int(item) if item is not None else 0 for item in (a, b)]
    return a + b/12.0

tests = [
('5Y3M', 5.25),
('5Y', 5.0),
('6M', 0.5),
('10Y11M', 10.917),
('3Y14M', 4.167),
]

for test, expected in tests:
    result = parse_aYbM(test)
    status = 'Failed'
    if abs(result - expected) < 0.001:
        status = 'Passed'
    print('{}: {} --> {}'.format(status, test, result))

收益率

Passed: 5Y3M --> 5.25
Passed: 5Y --> 5.0
Passed: 6M --> 0.5
Passed: 10Y11M --> 10.9166666667
Passed: 3Y14M --> 4.16666666667

注意，如果parse_aYbM的输入与模式不匹配，则不清楚应该发生什么。使用上面的代码，不匹配将引发ValueError。

In [227]: parse_aYbM('foo')
ValueError: input does not match aYbM

但是，部分匹配可能返回一个值：

In [229]: parse_aYbM('0Yfoo')
Out[229]: 0.0

票数 5

Stack Overflow用户

发布于 2015-07-05 01:43:01

您可以使用re.findall

>>> def parse(m):
    s = 0
    j = re.findall(r'\d+Y|\d+M', m)
    for i in j:
        if 'Y' in i:
            s += float(i[:-1])
        if 'M' in i:
            s += float(i[:-1])/12
    print(s)


>>> parse('5Y')
5.0
>>> parse('6M')
0.5
>>> parse('10Y11M')
10.916666666666666
>>> parse('3Y14M')
4.166666666666667

票数 0

Stack Overflow用户

发布于 2015-07-05 01:45:40

不熟悉python，但是尝试像(?<year>[^Y])\D(?<month>[^M]*)\D这样的方法就可以了。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/31226487

复制

相似问题

问Python -解析字符串，已知结构
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python -解析字符串，已知结构EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python -解析字符串，已知结构
EN