我有下面的文本数据,我需要根据下面的条件来解析和拆分成列。
任何以ENC_NAME开头的
=任何包含OA_VERSION的行,行尾的数字都应该放在列下面。
任何包含VC_ACTIVE的行都应该放在
任何包含VC_STDN的行都应该放在
文本数据
========= enc1001 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1002 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1003 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1004 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1005 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1006 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1007 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1008 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.40
2 HP VC Flex-10/10D Module 4.40
========= enc1009 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2001 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2002 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2003 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2004 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2005 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2006 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2007 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2008 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2009 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2011 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2013 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc3020 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.41
2 HP VC Flex-10/10D Module 4.41
========= enc3021 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.41
2 HP VC Flex-10/10D Module 4.41
========= enc3022 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.41
2 HP VC Flex-10/10D Module 4.41
========= enc3026 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.45
2 HP VC Flex-10/10D Module 4.45
========= enc3027 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc3028 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc3029 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc3030 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc3031 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc4021 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.41
2 HP VC Flex-10/10D Module 4.41
========= enc4023 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.41
2 HP VC Flex-10/10D Module 4.41
========= enc4024 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.41
2 HP VC Flex-10/10D Module 4.41
========= enc4025 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.41
2 HP VC Flex-10/10D Module 4.41
========= enc4026 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc4027 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc4028 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc4029 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc4030 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc4031 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc4032 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc4033 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc4034 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc6002 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6011 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6012 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6013 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6014 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6015 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6016 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6017 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc7002 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
========= enc7003 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
========= enc7004 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
========= enc7009 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1010 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1011 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1012 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1013 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1014 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1015 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1016 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1017 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1018 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc1025 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.62
2 HP VC Flex-10/10D Module 4.62
========= enc1026 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2010 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2012 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2014 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2015 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2016 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2018 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2019 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2020 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2021 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2022 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc2023 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc3033 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc3034 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc3036 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc4020 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.41
2 HP VC Flex-10/10D Module 4.41
========= enc4022 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.41
2 HP VC Flex-10/10D Module 4.41
========= enc4035 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc7005 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc7006 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC FlexFabric 10Gb/24-Port Module 4.50
2 HP VC FlexFabric 10Gb/24-Port Module 4.50
========= enc7007 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.62
2 HP VC Flex-10/10D Module 4.62
========= enc7008 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.62
2 HP VC Flex-10/10D Module 4.62
========= enc8001 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc8017 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc8018 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc8019 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc8021 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.50
2 HP VC Flex-10/10D Module 4.50
========= enc8022 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.62
2 HP VC Flex-10/10D Module 4.62
========= enc8023 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.62
2 HP VC Flex-10/10D Module 4.62
========= enc8024 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.62
2 HP VC Flex-10/10D Module 4.62
========= enc8025 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.62
2 HP VC Flex-10/10D Module 4.62
========= enc8026 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.62
2 HP VC Flex-10/10D Module 4.62
========= enc8027 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.62
2 HP VC Flex-10/10D Module 4.62
========= enc8028 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.62
2 HP VC Flex-10/10D Module 4.62
========= enc8033 =========
1 BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
1 HP VC Flex-10/10D Module 4.40
2 HP VC Flex-10/10D Module 4.40期望输出(例如):
ENC_NAME OA_VERSION VC_ACTIVE VC_STDN
enc4031 4.85 4.50 4.50
enc4032 4.85 4.50 4.50
enc4033 4.85 4.50 4.50
enc4034 4.85 4.50 4.50
enc6002 4.60 NaN NaN
enc6011 4.60 NaN NaN
enc6012 4.60 NaN NaN
enc6013 4.60 NaN NaN编辑(我试过的)
df = pd.read_csv("enc_list_sorted", names=["col1"])
df = df.col1.str.split(' ', expand = True)
df = df.drop(df.columns[[0, 2, 3, 4, 5, 6, 7, 8, 11]], axis=1)
df = df.rename(columns={ 1: 'ENC_NAME', 9: 'VC_VERSION', 10: 'OA_VERSION'})
print(df)
ENC_NAME VC_VERSION OA_VERSION
0 enc1001 None None
1 KVM 4.85
2 4.50 None
3 4.50 None
4 enc1002 None None
5 KVM 4.85
6 4.50 None
7 4.50 None
8 enc1003 None None
9 KVM 4.85
10 4.50 None
11 4.50 None
12 enc1004 None None
13 KVM 4.85
14 4.50 None
15 4.50 None任何帮助或想法都会很有帮助。
发布于 2020-06-26 16:22:07
正如注释中所建议的那样,使用pandas打开文件并进行解析并不理想。
假设您的数据保存在文本文件file.txt中
import pandas as pd
with open("file.txt") as file:
lines = [l.rstrip("\n") for l in file]
row_temp = [None] * 4
row = None
out = []
for line in lines:
if line.startswith("="):
if row is not None:
out.append(row)
row = row_temp.copy()
row[0] = line.replace("=", "").rstrip().lstrip()
if 'BladeSystem' in line:
row[1] = line.split(" ")[-1]
if '1 HP' in line:
row[2] = line.split(" ")[-1]
if '2 HP' in line:
row[3] = line.split(" ")[-1]
col_names = ["ENC_NAME", "OA_VERSION", "VC_ACTIVE", "VC_STDN"]
df = pd.DataFrame(out,
columns=col_names)返回要查找的输出。
发布于 2020-06-28 05:15:19
在我看来,使用自写解析器代替。您所拥有的可以看作是所谓DSL的一种形式,这是一种特定于领域的语言。这里使用的语法是相当宽容的:
import re, pandas as pd
from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor
class ENCVisitor(NodeVisitor):
grammar = Grammar(r"""
content = (ws / block)*
block = header oa_line vc_active? vc_stdn?
header = delim ws word ws delim nl
oa_line = ~"^(?=.*BladeSystem).+"m nl?
vc_active = ~"^(?=.*1 HP).+"m nl?
vc_stdn = ~"^(?=.*2 HP).+"m nl?
word = ~"\w+"
delim = ~"=+"
ws = ~"\s+"
nl = ~"[\n\r]+"
""")
version_pattern = re.compile(r"\d+\.\d+$")
def get_version(self, key, line):
match = self.version_pattern.search(line)
value = match.group(0) if match else None
return {key: value}
def generic_visit(self, node, visited_children):
return visited_children or node
def visit_header(self, node, visited_children):
header = visited_children[2]
return {"ENC_NAME": header.text}
def visit_oa_line(self, node, visited_children):
line, _ = visited_children
return self.get_version("OA_VERSION", line.text)
def visit_vc_active(self, node, visited_children):
line, _ = visited_children
return self.get_version("VC_ACTIVE", line.text)
def visit_vc_stdn(self, node, visited_children):
line, _ = visited_children
return self.get_version("VC_STDN", line.text)
def visit_block(self, node, visited_children):
dct = {}
for child in visited_children:
if isinstance(child, dict):
dct.update(child)
elif isinstance(child, list):
dct.update(child[0])
return dct
def visit_content(self, node, visited_children):
return [child[0] for child in visited_children if isinstance(child[0], dict)]
enc = ENCVisitor()
result = enc.parse(data)
df = pd.DataFrame(result)
print(df)对于您的数据,这将导致
ENC_NAME OA_VERSION VC_ACTIVE VC_STDN
0 enc1001 4.85 4.50 4.50
1 enc1002 4.85 4.50 4.50
2 enc1003 4.85 4.50 4.50
3 enc1004 4.85 4.50 4.50
4 enc1005 4.85 4.50 4.50
.. ... ... ... ...
94 enc8025 4.85 4.62 4.62
95 enc8026 4.85 4.62 4.62
96 enc8027 4.85 4.62 4.62
97 enc8028 4.85 4.62 4.62
98 enc8033 4.85 4.40 4.40
[99 rows x 4 columns]Explanation:您的输入可以看作是一种自己的迷你语言,一种所谓的特定领域语言。文件中的每个信息块由标题行、OA_VERSION行和可能存在或可能不存在的两行(VC_ACTIVE和VC_STDN)组成。标题行总是以===开头和结尾。
所有这些块都形成一个语法,即文件/字符串中的空白空间或多个块。在内部,我们构建了一个抽象的syntrax树(ast),为了检索信息,我们需要“访问”每个节点。在我选择使用的解析器库(优秀的parsimonious)中,这是通过一个NodeVisitor类完成的,并且通过相应的函数名访问ast的每个叶。这意味着如果我们调用一个部件"header",这个函数应该被命名为"visit_header“。
结果通过"visit_block“获取,是该块所有检索到的信息的字典。最后,一切都被输入到pandas中。
当然,这只能是一个简短的介绍,如果您想了解更多关于parsimonious的内容,请看一下。
发布于 2020-06-26 16:29:17
你可以试试这个:
import pandas as pd
import re
import numpy as np
with open(r'test1.txt','r') as file:
txto=file.read()
data=[]
pattern1 = re.compile('(^\=.+)\s.+$\n?', re.MULTILINE)
lstlines=txto.split('\n')
for ele1, ele2 in zip(re.findall(pattern1,txto),re.findall(pattern1,txto)[1:]):
row=lstlines[lstlines.index(ele1):lstlines.index(ele2)]
OA_VERSION=[i for i in row if 'BladeSystem' in i]
OA_VERSION=OA_VERSION[0].split()[-1] if len(OA_VERSION)>0 else np.nan
VC_ACTIVE=[i for i in row if '1 HP' in i]
VC_ACTIVE=VC_ACTIVE[0].split()[-1] if len(VC_ACTIVE)>0 else np.nan
VC_STDN=[i for i in row if '2 HP' in i]
VC_STDN=VC_STDN[0].split()[-1] if len(VC_STDN)>0 else np.nan
data.append([ele1.replace('=','').strip(),OA_VERSION, VC_ACTIVE,VC_STDN])
#last row
row=lstlines[lstlines.index(re.findall(pattern1,txto)[-1]):]
OA_VERSION=[i for i in row if 'BladeSystem' in i]
OA_VERSION=OA_VERSION[0].split()[-1] if len(OA_VERSION)>0 else np.nan
VC_ACTIVE=[i for i in row if '1 HP' in i]
VC_ACTIVE=VC_ACTIVE[0].split()[-1] if len(VC_ACTIVE)>0 else np.nan
VC_STDN=[i for i in row if '2 HP' in i]
VC_STDN=VC_STDN[0].split()[-1] if len(VC_STDN)>0 else np.nan
data.append([re.findall(pattern1,txto)[-1].replace('=','').strip(),OA_VERSION, VC_ACTIVE,VC_STDN])
#Create dataframe
df=pd.DataFrame(data, columns=['ENC_NAME ','OA_VERSION','VC_ACTIVE','VC_STDN'])
print(df)输出:
df
ENC_NAME OA_VERSION VC_ACTIVE VC_STDN
0 enc1001 4.85 4.50 4.50
1 enc1002 4.85 4.50 4.50
2 enc1003 4.85 4.50 4.50
3 enc1004 4.85 4.50 4.50
4 enc1005 4.85 4.50 4.50
.. ... ... ... ...
94 enc8025 4.85 4.62 4.62
95 enc8026 4.85 4.62 4.62
96 enc8027 4.85 4.62 4.62
97 enc8028 4.85 4.62 4.62
98 enc8033 4.85 4.40 4.40
[99 rows x 4 columns]https://stackoverflow.com/questions/62597638
复制相似问题