首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >根据条件解析文本数据并对齐列。

根据条件解析文本数据并对齐列。
EN

Stack Overflow用户
提问于 2020-06-26 15:07:36
回答 3查看 150关注 0票数 0

我有下面的文本数据,我需要根据下面的条件来解析和拆分成列。

任何以ENC_NAME开头的

  1. 都应该属于=

任何包含OA_VERSION的行,行尾的数字都应该放在列下面。

任何包含VC_ACTIVE的行都应该放在

  1. 列下

任何包含VC_STDN的行都应该放在

  1. 列下

文本数据

代码语言:javascript
复制
========= enc1001 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1002 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1003 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1004 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1005 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1006 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1007 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1008 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.40
  2 HP VC Flex-10/10D Module   4.40
========= enc1009 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2001 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2002 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2003 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2004 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2005 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2006 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2007 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2008 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2009 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2011 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2013 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3020 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc3021 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc3022 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc3026 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.45
  2 HP VC Flex-10/10D Module   4.45
========= enc3027 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3028 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3029 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3030 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3031 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4021 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4023 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4024 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4025 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4026 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4027 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4028 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4029 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4030 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4031 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4032 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4033 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4034 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc6002 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6011 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6012 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6013 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6014 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6015 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6016 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc6017 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.60
========= enc7002 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
========= enc7003 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
========= enc7004 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
========= enc7009 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1010 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1011 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1012 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1013 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1014 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1015 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1016 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1017 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1018 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc1025 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc1026 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2010 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2012 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2014 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2015 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2016 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2018 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2019 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2020 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2021 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2022 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc2023 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3033 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3034 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc3036 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc4020 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4022 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.41
  2 HP VC Flex-10/10D Module   4.41
========= enc4035 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc7005 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc7006 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC FlexFabric 10Gb/24-Port Module  4.50
  2 HP VC FlexFabric 10Gb/24-Port Module  4.50
========= enc7007 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc7008 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8001 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc8017 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc8018 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc8019 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc8021 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.50
  2 HP VC Flex-10/10D Module   4.50
========= enc8022 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8023 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8024 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8025 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8026 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8027 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8028 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.62
  2 HP VC Flex-10/10D Module   4.62
========= enc8033 =========
1   BladeSystem c7000 DDR2 Onboard Administrator with KVM 4.85
  1 HP VC Flex-10/10D Module   4.40
  2 HP VC Flex-10/10D Module   4.40

期望输出(例如):

代码语言:javascript
复制
ENC_NAME    OA_VERSION      VC_ACTIVE   VC_STDN
enc4031     4.85            4.50        4.50
enc4032     4.85            4.50        4.50
enc4033     4.85            4.50        4.50
enc4034     4.85            4.50        4.50
enc6002     4.60            NaN         NaN
enc6011     4.60            NaN         NaN
enc6012     4.60            NaN         NaN
enc6013     4.60            NaN         NaN

编辑(我试过的)

代码语言:javascript
复制
df  = pd.read_csv("enc_list_sorted", names=["col1"])
df = df.col1.str.split(' ', expand = True)
df = df.drop(df.columns[[0, 2, 3, 4, 5, 6, 7, 8, 11]], axis=1)


df = df.rename(columns={ 1: 'ENC_NAME', 9: 'VC_VERSION', 10: 'OA_VERSION'})

print(df)

        ENC_NAME VC_VERSION OA_VERSION
    0    enc1001       None       None
    1                   KVM       4.85
    2                  4.50       None
    3                  4.50       None
    4    enc1002       None       None
    5                   KVM       4.85
    6                  4.50       None
    7                  4.50       None
    8    enc1003       None       None
    9                   KVM       4.85
    10                 4.50       None
    11                 4.50       None
    12   enc1004       None       None
    13                  KVM       4.85
    14                 4.50       None
    15                 4.50       None

任何帮助或想法都会很有帮助。

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2020-06-26 16:22:07

正如注释中所建议的那样,使用pandas打开文件并进行解析并不理想。

假设您的数据保存在文本文件file.txt

代码语言:javascript
复制
import pandas as pd

with open("file.txt") as file:
    lines = [l.rstrip("\n") for l in file]


row_temp = [None] * 4
row = None
out = []
for line in lines:
    if line.startswith("="):
        if row is not None:
            out.append(row)
        row = row_temp.copy()
        row[0] = line.replace("=", "").rstrip().lstrip()

    if 'BladeSystem' in line:
        row[1] = line.split(" ")[-1]
    if '1 HP' in line:
        row[2] = line.split(" ")[-1]
    if '2 HP' in line:
        row[3] = line.split(" ")[-1]

col_names = ["ENC_NAME", "OA_VERSION", "VC_ACTIVE", "VC_STDN"]
df = pd.DataFrame(out,
                  columns=col_names)

返回要查找的输出。

票数 2
EN

Stack Overflow用户

发布于 2020-06-28 05:15:19

在我看来,使用自写解析器代替。您所拥有的可以看作是所谓DSL的一种形式,这是一种特定于领域的语言。这里使用的语法是相当宽容的:

代码语言:javascript
复制
import re, pandas as pd
from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor

class ENCVisitor(NodeVisitor):
    grammar = Grammar(r"""
            content     = (ws / block)*

            block       = header oa_line vc_active? vc_stdn?
            header      = delim ws word ws delim nl

            oa_line     = ~"^(?=.*BladeSystem).+"m nl?
            vc_active   = ~"^(?=.*1 HP).+"m nl?
            vc_stdn     = ~"^(?=.*2 HP).+"m nl?

            word        = ~"\w+"
            delim       = ~"=+"
            ws          = ~"\s+"
            nl          = ~"[\n\r]+"
    """)

    version_pattern = re.compile(r"\d+\.\d+$")

    def get_version(self, key, line):
        match = self.version_pattern.search(line)
        value = match.group(0) if match else None
        return {key: value}

    def generic_visit(self, node, visited_children):
        return visited_children or node

    def visit_header(self, node, visited_children):
        header = visited_children[2]
        return {"ENC_NAME": header.text}

    def visit_oa_line(self, node, visited_children):
        line, _ = visited_children
        return self.get_version("OA_VERSION", line.text)

    def visit_vc_active(self, node, visited_children):
        line, _ = visited_children
        return self.get_version("VC_ACTIVE", line.text)

    def visit_vc_stdn(self, node, visited_children):
        line, _ = visited_children
        return self.get_version("VC_STDN", line.text)

    def visit_block(self, node, visited_children):
        dct = {}
        for child in visited_children:
            if isinstance(child, dict):
                dct.update(child)
            elif isinstance(child, list):
                dct.update(child[0])
        return dct

    def visit_content(self, node, visited_children):
        return [child[0] for child in visited_children if isinstance(child[0], dict)]

enc = ENCVisitor()
result = enc.parse(data)

df = pd.DataFrame(result)
print(df)

对于您的数据,这将导致

代码语言:javascript
复制
   ENC_NAME OA_VERSION VC_ACTIVE VC_STDN
0   enc1001       4.85      4.50    4.50
1   enc1002       4.85      4.50    4.50
2   enc1003       4.85      4.50    4.50
3   enc1004       4.85      4.50    4.50
4   enc1005       4.85      4.50    4.50
..      ...        ...       ...     ...
94  enc8025       4.85      4.62    4.62
95  enc8026       4.85      4.62    4.62
96  enc8027       4.85      4.62    4.62
97  enc8028       4.85      4.62    4.62
98  enc8033       4.85      4.40    4.40

[99 rows x 4 columns]

Explanation:您的输入可以看作是一种自己的迷你语言,一种所谓的特定领域语言。文件中的每个信息块由标题行、OA_VERSION行和可能存在或可能不存在的两行(VC_ACTIVEVC_STDN)组成。标题行总是以===开头和结尾。

所有这些块都形成一个语法,即文件/字符串中的空白空间或多个块。在内部,我们构建了一个抽象的syntrax树(ast),为了检索信息,我们需要“访问”每个节点。在我选择使用的解析器库(优秀的parsimonious)中,这是通过一个NodeVisitor类完成的,并且通过相应的函数名访问ast的每个叶。这意味着如果我们调用一个部件"header",这个函数应该被命名为"visit_header“。

结果通过"visit_block“获取,是该块所有检索到的信息的字典。最后,一切都被输入到pandas中。

当然,这只能是一个简短的介绍,如果您想了解更多关于parsimonious的内容,请看一下。

票数 3
EN

Stack Overflow用户

发布于 2020-06-26 16:29:17

你可以试试这个:

代码语言:javascript
复制
import pandas as pd
import re
import numpy as np

with open(r'test1.txt','r') as file:
    txto=file.read()

data=[]
pattern1 = re.compile('(^\=.+)\s.+$\n?', re.MULTILINE)
lstlines=txto.split('\n')

for ele1, ele2 in zip(re.findall(pattern1,txto),re.findall(pattern1,txto)[1:]):
    row=lstlines[lstlines.index(ele1):lstlines.index(ele2)]

    OA_VERSION=[i for i in row if 'BladeSystem' in i]
    OA_VERSION=OA_VERSION[0].split()[-1] if len(OA_VERSION)>0 else np.nan
    
    VC_ACTIVE=[i for i in row if '1 HP' in i]
    VC_ACTIVE=VC_ACTIVE[0].split()[-1] if len(VC_ACTIVE)>0 else np.nan
    
    VC_STDN=[i for i in row if '2 HP' in i]
    VC_STDN=VC_STDN[0].split()[-1] if len(VC_STDN)>0 else np.nan
    
    data.append([ele1.replace('=','').strip(),OA_VERSION, VC_ACTIVE,VC_STDN])
    
#last row 
row=lstlines[lstlines.index(re.findall(pattern1,txto)[-1]):]
OA_VERSION=[i for i in row if 'BladeSystem' in i]
OA_VERSION=OA_VERSION[0].split()[-1] if len(OA_VERSION)>0 else np.nan
VC_ACTIVE=[i for i in row if '1 HP' in i]
VC_ACTIVE=VC_ACTIVE[0].split()[-1] if len(VC_ACTIVE)>0 else np.nan
VC_STDN=[i for i in row if '2 HP' in i]
VC_STDN=VC_STDN[0].split()[-1] if len(VC_STDN)>0 else np.nan
data.append([re.findall(pattern1,txto)[-1].replace('=','').strip(),OA_VERSION, VC_ACTIVE,VC_STDN]) 

#Create dataframe
df=pd.DataFrame(data, columns=['ENC_NAME ','OA_VERSION','VC_ACTIVE','VC_STDN'])
print(df)

输出:

代码语言:javascript
复制
df
   ENC_NAME  OA_VERSION VC_ACTIVE VC_STDN
0    enc1001       4.85      4.50    4.50
1    enc1002       4.85      4.50    4.50
2    enc1003       4.85      4.50    4.50
3    enc1004       4.85      4.50    4.50
4    enc1005       4.85      4.50    4.50
..       ...        ...       ...     ...
94   enc8025       4.85      4.62    4.62
95   enc8026       4.85      4.62    4.62
96   enc8027       4.85      4.62    4.62
97   enc8028       4.85      4.62    4.62
98   enc8033       4.85      4.40    4.40

[99 rows x 4 columns]
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62597638

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档