首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用Pythons正则表达式提取多个字符串

使用Pythons正则表达式提取多个字符串
EN

Stack Overflow用户
提问于 2015-01-19 02:25:53
回答 3查看 92关注 0票数 0

我有一个日志文件,它有以下输出,并将其缩短为数千行:

代码语言:javascript
复制
Time = 1

smoothSolver:  Solving for Ux, Initial residual = 0.230812, Final residual = 0.0134171, No Iterations 2
smoothSolver:  Solving for Uy, Initial residual = 0.283614, Final residual = 0.0158797, No Iterations 3
smoothSolver:  Solving for Uz, Initial residual = 0.190444, Final residual = 0.016567, No Iterations 2
GAMG:  Solving for p, Initial residual = 0.0850116, Final residual = 0.00375608, No Iterations 3
time step continuity errors : sum local = 0.00999678, global = 0.00142109, cumulative = 0.00142109
smoothSolver:  Solving for omega, Initial residual = 0.00267604, Final residual = 0.000166675, No Iterations 3
bounding omega, min: -26.6597 max: 18468.7 average: 219.43
smoothSolver:  Solving for k, Initial residual = 1, Final residual = 0.0862096, No Iterations 2
ExecutionTime = 4.84 s  ClockTime = 5 s

Time = 2

smoothSolver:  Solving for Ux, Initial residual = 0.0299872, Final residual = 0.00230507, No Iterations 2
smoothSolver:  Solving for Uy, Initial residual = 0.145767, Final residual = 0.00882969, No Iterations 3
smoothSolver:  Solving for Uz, Initial residual = 0.0863129, Final residual = 0.00858536, No Iterations 2
GAMG:  Solving for p, Initial residual = 0.394189, Final residual = 0.0175138, No Iterations 3
time step continuity errors : sum local = 0.00862823, global = 0.00212477, cumulative = 0.00354587
smoothSolver:  Solving for omega, Initial residual = 0.00258475, Final residual = 0.000222705, No Iterations 3
smoothSolver:  Solving for k, Initial residual = 0.112805, Final residual = 0.0054572, No Iterations 3
ExecutionTime = 5.9 s  ClockTime = 6 s

Time = 3

smoothSolver:  Solving for Ux, Initial residual = 0.128298, Final residual = 0.0070293, No Iterations 2
smoothSolver:  Solving for Uy, Initial residual = 0.138825, Final residual = 0.0116437, No Iterations 3
smoothSolver:  Solving for Uz, Initial residual = 0.0798979, Final residual = 0.00491246, No Iterations 3
GAMG:  Solving for p, Initial residual = 0.108748, Final residual = 0.00429273, No Iterations 2
time step continuity errors : sum local = 0.0073211, global = -0.00187909, cumulative = 0.00166678
smoothSolver:  Solving for omega, Initial residual = 0.00238456, Final residual = 0.000224435, No Iterations 3
smoothSolver:  Solving for k, Initial residual = 0.0529661, Final residual = 0.00280851, No Iterations 3
ExecutionTime = 6.92 s  ClockTime = 7 s

我需要使用Python的正则表达式提取Time = 1、2、3和相应的累积值。更准确地说,我只需要提取与时间= 1,2和3的累积值对应的值1、2、3和0.00142109、0.00354587、0.00166678,然后写入另一个文件。

目前,这就是我所拥有的:

代码语言:javascript
复制
contCumulative_0_out = open('contCumulative_0', 'w+')

with open(logFile, 'r') as logfile_read:
for line in logfile_read:
    line = line.rstrip()
    iteration_time = re.findall(r'^Time = ([0-9]+)', line)
    print iteration_time
    contCumulative_0 = re.search(r'cumulative = ((\d|.)+)', line)
    if contCumulative_0:        
        cumvalue = contCumulative_0.groups(1)
        contCumulative_0_out.write('\n'.join(cumvalue))

变量iteration_time获取时间值,但是在下一个if循环中这是不可用的,因此我无法将时间和累积时间结合在一起,以便在输出文件中给出1 0.00142109。

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2015-01-19 02:39:40

当该行中没有“时间”或“累积”时,就没有必要覆盖该变量。你可以这样做:

代码语言:javascript
复制
...
with open(logFile, 'r') as logfile_read:
for line in logfile_read:
    line = line.rstrip()
    if 'Time' in line:
        iteration_time = re.findall(r'^Time = ([0-9]+)', line)
        print iteration_time
    if 'cumulative' in line:
        contCumulative_0 = re.search(r'cumulative = ((\d|.)+)', line)
        if contCumulative_0:
            cumvalue = contCumulative_0.groups(1)
            contCumulative_0_out.write('\n'.join(cumvalue))
...
票数 1
EN

Stack Overflow用户

发布于 2015-01-19 02:38:24

您的代码在iteration_time循环的每一次迭代中都是通过for编写的。这就是问题所在。在成功地为第一个查找填充了这个变量之后,您将需要停止它的填充。

要做到这一点,在for循环中,只在不存在iteration_timeNone执行正则表达式搜索时间的情况下,对iteration_time进行测试。你可以这样做:

代码语言:javascript
复制
contCumulative_0_out = open('contCumulative_0', 'w+')

with open(logFile, 'r') as logfile_read:
    iteration_time = None
    for line in logfile_read:
        line = line.rstrip()
        time_match = re.findall(r'^Time = ([0-9]+)', line)
        if time_match:
            iteration_time = time_match
            print iteration_time
        else:  # Because if there is time_match, there is no 'cumulative = ...'
            contCumulative_0 = re.search(r'cumulative = ((\d|.)+)', line)
            if contCumulative_0:        
                cumvalue = contCumulative_0.groups(1)
                # You can check and use iteration_time here
                contCumulative_0_out.write('\n'.join(cumvalue))

希望这能有所帮助。

票数 1
EN

Stack Overflow用户

发布于 2015-01-19 02:53:29

您可以使用regex来完成此操作,假设您的日志格式对所有条目都是相同的。现将所发生的情况解释如下:

代码语言:javascript
复制
import re

s = """Time = 1

smoothSolver:  Solving for Ux, Initial residual = 0.230812, Final residual = 0.0134171, No Iterations 2
smoothSolver:  Solving for Uy, Initial residual = 0.283614, Final residual = 0.0158797, No Iterations 3
smoothSolver:  Solving for Uz, Initial residual = 0.190444, Final residual = 0.016567, No Iterations 2
GAMG:  Solving for p, Initial residual = 0.0850116, Final residual = 0.00375608, No Iterations 3
time step continuity errors : sum local = 0.00999678, global = 0.00142109, cumulative = 0.00142109
smoothSolver:  Solving for omega, Initial residual = 0.00267604, Final residual = 0.000166675, No Iterations 3
bounding omega, min: -26.6597 max: 18468.7 average: 219.43
smoothSolver:  Solving for k, Initial residual = 1, Final residual = 0.0862096, No Iterations 2
ExecutionTime = 4.84 s  ClockTime = 5 s

Time = 2

smoothSolver:  Solving for Ux, Initial residual = 0.230812, Final residual = 0.0134171, No Iterations 2
smoothSolver:  Solving for Uy, Initial residual = 0.283614, Final residual = 0.0158797, No Iterations 3
smoothSolver:  Solving for Uz, Initial residual = 0.190444, Final residual = 0.016567, No Iterations 2
GAMG:  Solving for p, Initial residual = 0.0850116, Final residual = 0.00375608, No Iterations 3
time step continuity errors : sum local = 0.00999678, global = 0.00142109, cumulative = 0.00123456
smoothSolver:  Solving for omega, Initial residual = 0.00267604, Final residual = 0.000166675, No Iterations 3
bounding omega, min: -26.6597 max: 18468.7 average: 219.43
smoothSolver:  Solving for k, Initial residual = 1, Final residual = 0.0862096, No Iterations 2
ExecutionTime = 4.84 s  ClockTime = 5 s
"""

regex = re.compile("^Time = (\d+?).*?cumulative = (\d{0,10}\.\d{0,10})",re.DOTALL|re.MULTILINE)

for x in re.findall(regex,s):
    print "{} => {}".format(x[0], x[1])

这将输出两个结果(因为我添加了两个日志条目,而不是您提供的一个日志条目):

代码语言:javascript
复制
1 => 0.00142109
2 => 0.00123456

这是怎么回事?

正在使用的RegEx如下:

代码语言:javascript
复制
^Time = (\d+?).*?cumulative = (\d{0,10}\.\d{0,10})

这个Regex在行的开头寻找您的Time =字符串,并匹配后面的数字。然后,它对字符串cumulative =进行非贪婪的匹配,并捕获后面的数字。非贪婪很重要,否则在整个日志中只能得到一个结果,因为它将匹配Time =的第一个实例和cumulative =的最后一个实例。

然后打印每个结果。每个捕获的结果都包含时间值和累积值。如果需要,可以将代码的这一部分修改为打印为文件。

此正则表达式跨多行工作,因为它使用了两个标志:多特雷多线

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/28017121

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档