首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用regex解析文件

使用regex解析文件
EN

Stack Overflow用户
提问于 2022-06-10 15:22:43
回答 2查看 58关注 0票数 0

我有一个大的文本文件(基本上是一个csv文件,但是它有很多不同的部分,在我看来这个文件不像一个适当的csv),文件的一部分如下所示:

代码语言:javascript
复制
7.27.27.2. Frame Counts: 2

Timestamp,Transmitted,Received Seconds,Frames,
1.818,"47,702","24,026"
2.847,"121,038","66,424"
3.818,"192,749","105,993"
4.851,"270,454","147,068"
5.817,"343,582","184,994"
6.818,"422,937","227,679"
7.847,"494,787","268,220"
8.847,"568,388","307,350"
9.818,"636,640","344,092"
10.824,"712,211","383,849"
11.846,"786,823","423,941"
12.818,"863,526","465,542"
13.847,"936,019","504,298"
14.847,"1,007,358","543,600"
15.847,"1,072,079","578,770"
16.847,"1,135,907","613,742"
17.847,"1,204,749","649,329"
18.817,"1,269,150","684,052"
19.817,"1,340,923","720,234"
20.860,"1,409,920","758,060"
21.847,"1,480,912","798,166"
22.101,"1,491,235","803,900"
23.108,"1,491,235","803,900"
7.27.28. Frame Rate

Rates can vary due to round-off errors in calculations. Timestamp,Transmit rate,Receive rate Seconds,Frames/s,
1.818,"39,450","39,390"
2.847,"112,400","112,500"
3.818,"114,600","114,600"
4.851,"115,000","115,000"
5.817,"115,000","114,900"
6.818,"121,900","121,600"
7.847,"109,200","109,500"
8.847,"112,700","112,600"
9.818,"108,100","108,200"
10.824,"114,700","114,600"
11.846,"112,200","112,200"
12.818,"121,700","121,700"
13.847,"108,100","108,100"
14.847,"110,600","110,600"
15.847,"99,900","99,770"
16.847,"98,790","98,910"
17.847,"104,400","104,400"
18.817,"102,200","102,300"
19.817,"108,000","108,000"
20.860,"102,400","102,400"
21.847,"112,500","112,600"
22.101,"63,410","63,470"
23.108,0.00,0.00
7.27.28.1. Frame Rate: 1








Test Model: IPSEC-JENKINS Version: 53 Result: canceled Date: June 10, 2022 5:10:46 AM PDT Test Duration: 00:00:25.436
7. Test Results for IPSEC
7.1. Component Description Component: Application Simulator


Component,Resource Used IPSEC,np3-0
7.2. Test Component Criteria Number,Description 1,The total number of sessions opened must reach the specified target within the allotted time.: (maxConcurrentAppFlows>=sessions.target) 2,The total number of failed application transactions must be no more than 5 percent of the attempted application transactions.: ((appUnsuccessful*100)<=(appAttempted*5)) 3,The session rate must reach the specified target within the allotted time.: (maxAppFlowRate>=sessions.targetPerSecond)
7.3. Settings Parameter,Value Resource Percentage,50 Application Profile,MixCISCO MIX 4451 Delay Start,00:00:00 Data Rate/Data Rate Unlimited,false Data Rate/Data Rate Scope,Limit Aggregate Throughput Data Rate/Data Rate Unit,Megabits / Second Data Rate/Data Rate Type,Constant Data Rate/Minimum Data Rate,10000 Data Rate/Maximum Data Rate,10000 Session/Super Flow Configuration/Maximum Simultaneous Super Flows,1030 Session/Super Flow Configuration/Maximum Simultaneous Active Flows,0 Session/Super Flow Configuration/Maximum Super Flows Per Second,1030 Session/Super Flow Configuration/Unlimited Super Flow Open Rate,false Session/Super Flow Configuration/Unlimited Super Flow Close Rate,false Session/Super Flow Configuration/Target Minimum Simultaneous Flows,1 Session/Super Flow Configuration/Target Minimum Super Flows Per Second,1 Session/Super Flow Configuration/Target Number of Successful Matches,0 Session/Super Flow Configuration/Engine Selection,Advanced (Max Features) Session/Super Flow Configuration/Performance Emphasis,Balanced Session/Super Flow Configuration/Resource Allocation Override,Automatic Session/Super Flow Configuration/Statistic Detail,Maximum App Configuration/Remove all DNS actions,false App Configuration/Streams Per Super Flow,1 App Configuration/Content Fidelity,Normal App Configuration/Replace Streams at Runtime,true Source Port/Port Distribution Type,Random Source Port/Minimum Port Number,1024 Source Port/Maximum Port Number,65535 TCP Configuration/Maximum Segment Size (MSS),1260 TCP Configuration/Aging Time Data Type,Seconds TCP Configuration/Aging Time,0 TCP Configuration/Reset at End,false TCP Configuration/Retry Quantum,500 TCP Configuration/Retry Count,3 TCP Configuration/Delay ACKs,true TCP Configuration/Disable Piggy-back data on ACK (experimental),false TCP Configuration/Delayed ACKs ms,0 TCP Configuration/ACK every N (experimental),0 TCP Configuration/Initial Receive Window,5792 TCP Configuration/TCP Window Scale,0 TCP Configuration/Dynamic Receive Window Size,true TCP Configuration/Add Segment Timestamps,true TCP Configuration/Piggy-back Data on 3-way Handshake ACK,false TCP Configuration/Piggy-back Data on Shutdown FIN,false TCP Configuration/Initial Congestion Window,4 TCP Configuration/Explicit Congestion Notification,Support ECN TCP Configuration/Raw Flags,-1 TCP Configuration/Connect Delay,0 TCP Configuration/TCP Keepalive Timer,0 TCP Configuration/4-way Close,false TCP Configuration/Send PSH with all data segments,false IPv4 Configuration/TTL,32 IPv4 Configuration/TOS/DSCP,0x0 IPv6 Configuration/Hop Limit,64 IPv6 Configuration/Traffic Class,0x0 IPv6 Configuration/Flow Label,0x0 SSL Configuration/Session Reuse Capacity,Low SSL Configuration/Server Record Length,0 SSL Configuration/Client Record Length,0 Ramp Up Profile/Ramp Up Profile Type,Calculated Ramp Up Profile/Min Connection Rate,1 Ramp Up Profile/Max Connection Rate,1 Ramp Up Profile/Increment n Connections per Interval,1 Ramp Up Profile/Fixed Time Interval,00:00:01 Session Ramp Distribution/Ramp Up Behavior,Full Open Session Ramp Distribution/SYN Only Retry Mode,Obey Retry Count Session Ramp Distribution/Ramp Up Duration,00:00:00 Session Ramp Distribution/Steady-State Behavior,Open and Close Sessions Session Ramp Distribution/Steady-State Time Interval,00:02:15 Session Ramp Distribution/Ramp Down Behavior,Full Close Session Ramp Distribution/Ramp Down Time Interval,00:00:05 Experimental Advanced Settings/TCP Segments Credit,32 Experimental Advanced Settings/Send maximum size segments when possible,false Load Profile/,None Preset the component was created from,Appsim Default
7.4. App Profile Summary Weighted by flows Name,Weight,% Bandwidth,% Flows,Bytes,Flows,Seed CISCO MARCH G729 - DIA,"15,392",,,,,1 CISCO MARCH HTTP APPLICATION - DIA,"6,453",,,,,1 CISCO MARCH HTTP 32K GET - DIA,"14,969",,,,,1 CISCO MARCH HTTPS 16K - DIA,"31,729",,,,,1 CISCO MARCH CITRIX - DIA,282,,,,,1 CISCO MARCH HTTPS 64K - DIA,"9,130",,,,,1 CISCO MARCH MS-EXCHANGE - DIA,"13,212",,,,,1 CISCO MARCH HTTPS Live Streaming - DIA,584,,,,,1 CISCO MARCH HTTPS 1024K - DIA,617,,,,,1 CISCO MARCH H264 Video New - DIA,"6,576",,,,,1 CISCO MARCH POP3BANDWIDTH,95,,,,,1 CISCO MARCH SMTP,956,,,,,1
7.5. Traffic Appearance Traffic was addressed as defined in the "IPSEC-CURIE" network neighborhood. Interface,Traffic Direction,Network Domain,VLAN,Address Range 1,Client,CLIENT,,2.0.0.10
- 2.0.0.109 2,Server,SERVER,,5.0.0.10 - 5.0.0.109
7.6. Component Results Component,Result IPSEC,canceled
7.7. Application Aggregate Flows

There may be slices in this graph that are too small to be displayed. Protocol,Aggregate Flows (Flows),Aggregate Flows (%) SMTP,242,1.101% RTP,295,1.342% DNS,185,0.842% POP3-Advanced,25,0.114% HTTP,"17,440",79.345% Citrix,69,0.314% Microsoft Exchange,"3,724",16.943%

我想摘录7.27.28节的内容,即:

代码语言:javascript
复制
1.818,"39,450","39,390"
2.847,"112,400","112,500"
3.818,"114,600","114,600"
4.851,"115,000","115,000"
5.817,"115,000","114,900"
6.818,"121,900","121,600"
7.847,"109,200","109,500"
8.847,"112,700","112,600"
9.818,"108,100","108,200"
10.824,"114,700","114,600"
11.846,"112,200","112,200"
12.818,"121,700","121,700"
13.847,"108,100","108,100"
14.847,"110,600","110,600"
15.847,"99,900","99,770"
16.847,"98,790","98,910"
17.847,"104,400","104,400"
18.817,"102,200","102,300"
19.817,"108,000","108,000"
20.860,"102,400","102,400"
21.847,"112,500","112,600"
22.101,"63,410","63,470"
23.108,0.00,0.00

要读取上述数据,我正在考虑使用regex,然后使用csv解析该部分,但下面的代码不起作用:

代码语言:javascript
复制
pattern = r"""7.27.28. Frame Rate

Rates can vary due to round-off errors in calculations.
Timestamp,Transmit rate,Receive rate
Seconds,Frames/s,
(.*)
7.27.28.1. Frame Rate: 1"""
match = re.search(pattern, all_of_it)
print(match.group(1))

请让我知道正确的模式,或有其他方法提取数据吗?

EN

回答 2

Stack Overflow用户

发布于 2022-06-10 15:31:12

这不是regex的答案,但可能仍然有用。

这里的关键技巧是使用text.split("\n\n")对空行进行分区,然后使用startswith选择感兴趣的部分。

代码语言:javascript
复制
text = """
7.27.27.2. Frame Counts: 2

Timestamp,Transmitted,Received Seconds,Frames,
1.818,"47,702","24,026"
2.847,"121,038","66,424"
3.818,"192,749","105,993"
4.851,"270,454","147,068"
5.817,"343,582","184,994"
6.818,"422,937","227,679"
7.847,"494,787","268,220"
8.847,"568,388","307,350"
9.818,"636,640","344,092"
10.824,"712,211","383,849"
11.846,"786,823","423,941"
12.818,"863,526","465,542"
13.847,"936,019","504,298"
14.847,"1,007,358","543,600"
15.847,"1,072,079","578,770"
16.847,"1,135,907","613,742"
17.847,"1,204,749","649,329"
18.817,"1,269,150","684,052"
19.817,"1,340,923","720,234"
20.860,"1,409,920","758,060"
21.847,"1,480,912","798,166"
22.101,"1,491,235","803,900"
23.108,"1,491,235","803,900"
7.27.28. Frame Rate

Rates can vary due to round-off errors in calculations. Timestamp,Transmit rate,Receive rate Seconds,Frames/s,
1.818,"39,450","39,390"
2.847,"112,400","112,500"
3.818,"114,600","114,600"
4.851,"115,000","115,000"
5.817,"115,000","114,900"
6.818,"121,900","121,600"
7.847,"109,200","109,500"
8.847,"112,700","112,600"
9.818,"108,100","108,200"
10.824,"114,700","114,600"
11.846,"112,200","112,200"
12.818,"121,700","121,700"
13.847,"108,100","108,100"
14.847,"110,600","110,600"
15.847,"99,900","99,770"
16.847,"98,790","98,910"
17.847,"104,400","104,400"
18.817,"102,200","102,300"
19.817,"108,000","108,000"
20.860,"102,400","102,400"
21.847,"112,500","112,600"
22.101,"63,410","63,470"
23.108,0.00,0.00
7.27.28.1. Frame Rate: 1








Test Model: IPSEC-JENKINS Version: 53 Result: canceled Date: June 10, 2022 5:10:46 AM PDT Test Duration: 00:00:25.436
7. Test Results for IPSEC
7.1. Component Description Component: Application Simulator


Component,Resource Used IPSEC,np3-0
7.2. Test Component Criteria Number,Description 1,The total number of sessions opened must reach the specified target within the allotted time.: (maxConcurrentAppFlows>=sessions.target) 2,The total number of failed application transactions must be no more than 5 percent of the attempted application transactions.: ((appUnsuccessful*100)<=(appAttempted*5)) 3,The session rate must reach the specified target within the allotted time.: (maxAppFlowRate>=sessions.targetPerSecond)
7.3. Settings Parameter,Value Resource Percentage,50 Application Profile,MixCISCO MIX 4451 Delay Start,00:00:00 Data Rate/Data Rate Unlimited,false Data Rate/Data Rate Scope,Limit Aggregate Throughput Data Rate/Data Rate Unit,Megabits / Second Data Rate/Data Rate Type,Constant Data Rate/Minimum Data Rate,10000 Data Rate/Maximum Data Rate,10000 Session/Super Flow Configuration/Maximum Simultaneous Super Flows,1030 Session/Super Flow Configuration/Maximum Simultaneous Active Flows,0 Session/Super Flow Configuration/Maximum Super Flows Per Second,1030 Session/Super Flow Configuration/Unlimited Super Flow Open Rate,false Session/Super Flow Configuration/Unlimited Super Flow Close Rate,false Session/Super Flow Configuration/Target Minimum Simultaneous Flows,1 Session/Super Flow Configuration/Target Minimum Super Flows Per Second,1 Session/Super Flow Configuration/Target Number of Successful Matches,0 Session/Super Flow Configuration/Engine Selection,Advanced (Max Features) Session/Super Flow Configuration/Performance Emphasis,Balanced Session/Super Flow Configuration/Resource Allocation Override,Automatic Session/Super Flow Configuration/Statistic Detail,Maximum App Configuration/Remove all DNS actions,false App Configuration/Streams Per Super Flow,1 App Configuration/Content Fidelity,Normal App Configuration/Replace Streams at Runtime,true Source Port/Port Distribution Type,Random Source Port/Minimum Port Number,1024 Source Port/Maximum Port Number,65535 TCP Configuration/Maximum Segment Size (MSS),1260 TCP Configuration/Aging Time Data Type,Seconds TCP Configuration/Aging Time,0 TCP Configuration/Reset at End,false TCP Configuration/Retry Quantum,500 TCP Configuration/Retry Count,3 TCP Configuration/Delay ACKs,true TCP Configuration/Disable Piggy-back data on ACK (experimental),false TCP Configuration/Delayed ACKs ms,0 TCP Configuration/ACK every N (experimental),0 TCP Configuration/Initial Receive Window,5792 TCP Configuration/TCP Window Scale,0 TCP Configuration/Dynamic Receive Window Size,true TCP Configuration/Add Segment Timestamps,true TCP Configuration/Piggy-back Data on 3-way Handshake ACK,false TCP Configuration/Piggy-back Data on Shutdown FIN,false TCP Configuration/Initial Congestion Window,4 TCP Configuration/Explicit Congestion Notification,Support ECN TCP Configuration/Raw Flags,-1 TCP Configuration/Connect Delay,0 TCP Configuration/TCP Keepalive Timer,0 TCP Configuration/4-way Close,false TCP Configuration/Send PSH with all data segments,false IPv4 Configuration/TTL,32 IPv4 Configuration/TOS/DSCP,0x0 IPv6 Configuration/Hop Limit,64 IPv6 Configuration/Traffic Class,0x0 IPv6 Configuration/Flow Label,0x0 SSL Configuration/Session Reuse Capacity,Low SSL Configuration/Server Record Length,0 SSL Configuration/Client Record Length,0 Ramp Up Profile/Ramp Up Profile Type,Calculated Ramp Up Profile/Min Connection Rate,1 Ramp Up Profile/Max Connection Rate,1 Ramp Up Profile/Increment n Connections per Interval,1 Ramp Up Profile/Fixed Time Interval,00:00:01 Session Ramp Distribution/Ramp Up Behavior,Full Open Session Ramp Distribution/SYN Only Retry Mode,Obey Retry Count Session Ramp Distribution/Ramp Up Duration,00:00:00 Session Ramp Distribution/Steady-State Behavior,Open and Close Sessions Session Ramp Distribution/Steady-State Time Interval,00:02:15 Session Ramp Distribution/Ramp Down Behavior,Full Close Session Ramp Distribution/Ramp Down Time Interval,00:00:05 Experimental Advanced Settings/TCP Segments Credit,32 Experimental Advanced Settings/Send maximum size segments when possible,false Load Profile/,None Preset the component was created from,Appsim Default
7.4. App Profile Summary Weighted by flows Name,Weight,% Bandwidth,% Flows,Bytes,Flows,Seed CISCO MARCH G729 - DIA,"15,392",,,,,1 CISCO MARCH HTTP APPLICATION - DIA,"6,453",,,,,1 CISCO MARCH HTTP 32K GET - DIA,"14,969",,,,,1 CISCO MARCH HTTPS 16K - DIA,"31,729",,,,,1 CISCO MARCH CITRIX - DIA,282,,,,,1 CISCO MARCH HTTPS 64K - DIA,"9,130",,,,,1 CISCO MARCH MS-EXCHANGE - DIA,"13,212",,,,,1 CISCO MARCH HTTPS Live Streaming - DIA,584,,,,,1 CISCO MARCH HTTPS 1024K - DIA,617,,,,,1 CISCO MARCH H264 Video New - DIA,"6,576",,,,,1 CISCO MARCH POP3BANDWIDTH,95,,,,,1 CISCO MARCH SMTP,956,,,,,1
7.5. Traffic Appearance Traffic was addressed as defined in the "IPSEC-CURIE" network neighborhood. Interface,Traffic Direction,Network Domain,VLAN,Address Range 1,Client,CLIENT,,2.0.0.10
- 2.0.0.109 2,Server,SERVER,,5.0.0.10 - 5.0.0.109
7.6. Component Results Component,Result IPSEC,canceled
7.7. Application Aggregate Flows

There may be slices in this graph that are too small to be displayed. Protocol,Aggregate Flows (Flows),Aggregate Flows (%) SMTP,242,1.101% RTP,295,1.342% DNS,185,0.842% POP3-Advanced,25,0.114% HTTP,"17,440",79.345% Citrix,69,0.314% Microsoft Exchange,"3,724",16.943%
"""

from io import StringIO

from pandas import read_csv

for line in text.split("\n\n"):
    if line.startswith("Rates"):
        break

line = line.replace("Rates can vary due to round-off errors in calculations. ", "")
df = read_csv(StringIO(line))
票数 0
EN

Stack Overflow用户

发布于 2022-06-10 16:43:39

有更好的解决方案(基于正则表达式)。也许还有一种方法可以写更少的正则表达式,但我不是专家!对不起,变量命名错误!

代码语言:javascript
复制
import re 

text = "all your text"

LONG_LINE = "Rates can vary due to round-off errors in calculations. Timestamp,Transmit rate,Receive rate Seconds,Frames/s,"
LAST_ROW = "7.27.28.1. Frame Rate: 1"
regex = re.compile(f"({LONG_LINE})(.*)({LAST_ROW})", re.MULTILINE|re.DOTALL)
m = regex.search(text)
your_section = "".join(m.groups(2)[1])


regex2 = re.compile("(^\d)(.*)", re.MULTILINE|re.DOTALL)
m2 = regex2.search(your_section)
print("".join(m2.groups()).strip())
代码语言:javascript
复制
1.818,"39,450","39,390"
2.847,"112,400","112,500"
3.818,"114,600","114,600"
4.851,"115,000","115,000"
5.817,"115,000","114,900"
6.818,"121,900","121,600"
7.847,"109,200","109,500"
8.847,"112,700","112,600"
9.818,"108,100","108,200"
10.824,"114,700","114,600"
11.846,"112,200","112,200"
12.818,"121,700","121,700"
13.847,"108,100","108,100"
14.847,"110,600","110,600"
15.847,"99,900","99,770"
16.847,"98,790","98,910"
17.847,"104,400","104,400"
18.817,"102,200","102,300"
19.817,"108,000","108,000"
20.860,"102,400","102,400"
21.847,"112,500","112,600"
22.101,"63,410","63,470"
23.108,0.00,0.00
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72576634

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档