首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >无需硬编码值即可将XML文件解析为CSV

无需硬编码值即可将XML文件解析为CSV
EN

Stack Overflow用户
提问于 2020-10-13 02:27:33
回答 1查看 350关注 0票数 0

我想知道是否有一种方法可以解析XML,并基本上获得所有标记(或尽可能多),并将它们放入列中,而无需进行硬编码。

例如,我的xml中的eventType标记。我希望它最初创建一个名为"eventType“的列,并将值放在该列的下面。它解析的每个"eventType“标记都会被放在同一列中。

下面是我试图让它看起来是这样的:

下面是XML示例:

代码语言:javascript
复制
<?xml version="1.0" encoding="UTF-8"?>

<faults version="1" xmlns="urn:nortel:namespaces:mcp:faults" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:nortel:namespaces:mcp:faults NortelFaultSchema.xsd ">
    <family longName="1OffMsgr" shortName="OOM"/>
    <family longName="ACTAGENT" shortName="ACAT">
        <logs>
           <log>
                <eventType>RES</eventType>
                <number>1</number>
                <severity>INFO</severity>
                <descTemplate>
                     <msg>Accounting is enabled upon this NE.</msg>
               </descTemplate>
               <note>This log is generated when setting a Session Manager's AM from &lt;none&gt; to a valid AM.</note>
               <om>On all instances of this Session Manager, the &lt;NE_Inst&gt;:&lt;AM&gt;:STD:acct OM row in the  StdRecordStream group will appear and start counting the recording units sent to the configured AM.
                   On the configured AM, the &lt;NE_inst&gt;:acct OM rows in RECSTRMCOLL group will appear and start counting the recording units received from this Session Manager's instances.
               </om>
            </log>
           <log>
                <eventType>RES</eventType>
                <number>2</number>
                <severity>ALERT</severity>
                <descTemplate>
                     <msg>Accounting is disabled upon this NE.</msg>
               </descTemplate>
               <note>This log is generated when setting a Session Manager's AM from a valid AM to &lt;none&gt;.</note>
               <action>If you do not intend for the Session Manager to produce accounting records, then no action is required.  If you do intend for the Session Manager to produce accounting records, then you should set the Session Manager's AM to a valid AM.</action>
               <om>On all instances of this Session Manager, the &lt;NE_Inst&gt;:&lt;AM&gt;:STD:acct OM row in the StdRecordStream group that matched the previous datafilled AM will disappear.
                   On the previously configured AM, the  &lt;NE_inst&gt;:acct OM rows in RECSTRMCOLL group will disappear.
               </om>
            </log>
        </logs>
    </family>
    <family longName="ACODE" shortName="AC">
        <alarms>
            <alarm>
                <eventType>ADMIN</eventType>
                <number>1</number>
                <probableCause>INFORMATION_MODIFICATION_DETECTED</probableCause>
                <descTemplate>
                    <msg>Configured data for audiocode server updated: $1</msg>
                     <param>
                         <num>1</num>
                         <description>AudioCode configuration data got updated</description>
                         <exampleValue>acgwy1</exampleValue>
                     </param>
               </descTemplate>
               <manualClearable></manualClearable>
               <correctiveAction>None. Acknowledge/Clear alarm and deploy the audiocode server if appropriate.</correctiveAction>
               <alarmName>Audiocode Server Updated</alarmName>
               <severities>
                     <severity>MINOR</severity>
               </severities>               
            </alarm>
            <alarm>
                <eventType>ADMIN</eventType>
                <number>2</number>
                <probableCause>CONFIG_OR_CUSTOMIZATION_ERROR</probableCause>
                <descTemplate>
                    <msg>Deployment for audiocode server failed: $1. Reason: $2.</msg>
                     <param>
                         <num>1</num>
                         <description>AudioCode Name</description>
                         <exampleValue>audcod</exampleValue>
                     </param>
                     <param>
                         <num>2</num>
                         <description>AudioCode Deployment failed reason</description>
                         <exampleValue>Failed to parse audiocode configuration data</exampleValue>
                     </param>
               </descTemplate>
               <manualClearable></manualClearable>
               <correctiveAction>Check the configuration of audiocode server. Acknowledge/Clear alarm and deploy the audiocode server if appropriate.</correctiveAction>
               <alarmName>Audiocode Server Deploy Failed</alarmName>
               <severities>
                     <severity>MINOR</severity> 
                     <severity>MAJOR</severity>
               </severities>               
            </alarm>
            <alarm>
                <eventType>COMM</eventType>
                <number>2</number>
                <probableCause>LOSS_OF_FRAME</probableCause>
                <descTemplate>
                    <msg>Far end LOF (a.k.a., Yellow Alarm). Trunk (DS1 Number): $1.</msg>
                     <param>
                         <num>1</num>
                         <description>Trunk Number of Trunk with configuration problem</description>
                         <exampleValue>2</exampleValue>
                     </param>
               </descTemplate>
               <clearCondition>Far end is correctly configured for proper framing.</clearCondition>
               <correctiveAction>Check that the far end is configured for the proper framing.</correctiveAction>
               <alarmName>Far end LOF</alarmName>
               <severities>
                     <severity>CRITICAL</severity>
               </severities>
               <note>This alarm indicates the Trunk Framing settings on the connected PSTN switch do not match those provisioned on the Audiocodes Mediant 2k.</note>
            </alarm>
            <alarm>
                <eventType>COMM</eventType>
                <number>3</number>
                <probableCause>LOSS_OF_FRAME</probableCause>
                <descTemplate>
                    <msg>Near end sending LOF Indication. Trunk (DS1 Number): $1.</msg>
                     <param>
                         <num>1</num>
                         <description>Trunk Number of Trunk with configuration problem</description>
                         <exampleValue>2</exampleValue>
                     </param>
               </descTemplate>
               <clearCondition>Gateway is correctly configured for proper framing.</clearCondition>
               <correctiveAction>Check that the Audiocodes gateway is configured for the proper framing.</correctiveAction>
               <alarmName>Near end sending LOF Indication</alarmName>
               <severities>
                     <severity>CRITICAL</severity>
               </severities>               
            </alarm>
        </alarms>
    </family>
</faults>

这是代码,你可以看到我的标记名是硬编码的:

代码语言:javascript
复制
from xml.etree import ElementTree
import csv
import lxml.etree
import pandas as pd
from copy import copy
from pprint import pprint


tree = ElementTree.parse('FaultFamilies.xml')


sitescope_data = open('Out.csv', 'w', newline='', encoding='utf-8')
csvwriter = csv.writer(sitescope_data)

# Create all needed columns here in order and writes them to excel file
col_names = ['longName', 'shortName', 'eventType', 'ProbableCause', 'Severity', 'alarmName', 'clearCondition',
             'correctiveAction', 'note', 'action', 'om']
csvwriter.writerow(col_names)



def recurse(root, props):

    # Finds every single tag in the xml file
    for child in root:
        #print(child.text)
        if child.tag == '{urn:nortel:namespaces:mcp:faults}family':
            # copy of the dictionary
            p2 = copy(props)

            # adds to the dictionary the longNm name and shortName
            p2['longName'] = child.attrib.get('longName', '')
            p2['shortName'] = child.attrib.get('shortName', '')
            recurse(child, p2)
        else:
            recurse(child, props)

    # FIND ALL NEEDED ALARMS INFORMATION
    for event in root.findall('{urn:nortel:namespaces:mcp:faults}alarm'):

        event_data = [props.get('longName',''), props.get('shortName', '')]

        # Find eventType and appends it
        event_id = event.find('{urn:nortel:namespaces:mcp:faults}eventType')
        if event_id != None:
            event_id = event_id.text
        # appends to the to the list with comma
        event_data.append(event_id)

        # Find probableCause and appends it
        probableCause = event.find('{urn:nortel:namespaces:mcp:faults}probableCause')
        if probableCause != None:
            probableCause = probableCause.text
        event_data.append(probableCause)

        # Find severities and appends it
        severities = event.find('{urn:nortel:namespaces:mcp:faults}severities')
        if severities:
            severity_data = ','.join(
                [sv.text for sv in severities.findall('{urn:nortel:namespaces:mcp:faults}severity')])
            event_data.append(severity_data)
        else:
            event_data.append("")

        # Find alarmName and appends it
        alarmName = event.find('{urn:nortel:namespaces:mcp:faults}alarmName')
        if alarmName != None:
            alarmName = alarmName.text
        event_data.append(alarmName)

        clearCondition = event.find('{urn:nortel:namespaces:mcp:faults}clearCondition')
        if clearCondition != None:
            clearCondition = clearCondition.text
        event_data.append(clearCondition)

        correctiveAction = event.find('{urn:nortel:namespaces:mcp:faults}correctiveAction')
        if correctiveAction != None:
            correctiveAction = correctiveAction.text
        event_data.append(correctiveAction)

        note = event.find('{urn:nortel:namespaces:mcp:faults}note')
        if note != None:
            note = note.text
        event_data.append(note)

        action = event.find('{urn:nortel:namespaces:mcp:faults}action')
        if action != None:
            action = action.text
        event_data.append(action)

        csvwriter.writerow(event_data)

    # FIND ALL LOGS INFORMATION
    for event in root.findall('{urn:nortel:namespaces:mcp:faults}log'):
        event_data = [props.get('longName', ''), props.get('shortName', '')]

        event_id = event.find('{urn:nortel:namespaces:mcp:faults}eventType')
        if event_id != None:
            event_id = event_id.text
        event_data.append(event_id)

        probableCause = event.find('{urn:nortel:namespaces:mcp:faults}probableCause')
        if probableCause != None:
            probableCause = probableCause.text
        event_data.append(probableCause)

        severities = event.find('{urn:nortel:namespaces:mcp:faults}severity')
        if severities != None:
            severities = severities.text
        event_data.append(severities)

        alarmName = event.find('{urn:nortel:namespaces:mcp:faults}alarmName')
        if alarmName != None:
            alarmName = alarmName.text
        event_data.append(alarmName)

        # Find alarmName and appends it
        clearCondition = event.find('{urn:nortel:namespaces:mcp:faults}clearCondition')
        if clearCondition != None:
            clearCondition = clearCondition.text
        event_data.append(clearCondition)

        correctiveAction = event.find('{urn:nortel:namespaces:mcp:faults}correctiveAction')
        if correctiveAction != None:
            correctiveAction = correctiveAction.text
        event_data.append(correctiveAction)

        note = event.find('{urn:nortel:namespaces:mcp:faults}note')
        if note != None:
            note = note.text
        event_data.append(note)

        action = event.find('{urn:nortel:namespaces:mcp:faults}action')
        if action != None:
            action = action.text
        event_data.append(action)
        csvwriter.writerow(event_data)


root = tree.getroot()
recurse(root, {})  # root + empty dictionary
print("File successfuly converted to CSV")
sitescope_data.close()

运行@tdelaney解决方案时:

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-10-13 04:51:16

您可以构建一个列表列表来表示表中的行。每当到了新行的时候,就构建一个新列表,所有已知列都默认为"",并将其附加到外部列表的底部。当需要插入新列时,只需遍历现有的内部列表并附加一个默认的""单元格。保留一个已知列名的映射,以便在行中建立索引。现在,当您遍历事件时,使用标记名查找行索引,并将其值添加到表中的最新行。

看起来您需要"log“和"alarm”标记,但是我编写了元素选择器来接受任何有"eventType“子元素的元素。由于"longName“和"shortName”对于给定条件下的所有事件都是通用的,因此有一个外部循环来获取它们并应用于表的每个新行。我切换到了xpath,这样我就可以更简洁地设置名称空间和编写选择器。这是我个人的偏好,但我认为这会使xpath更具可读性。

代码语言:javascript
复制
import csv
import lxml.etree
from lxml.etree import QName
import operator

class ExpandingTable:
    """A 2 dimensional table where columns are exapanded as new column
    types are discovered"""

    def __init__(self):
        """Create table that can expand rows and columns"""
        self.name_to_col = {}
        self.table = []
    
    def add_column(self, name):
        """Add column named `name` unless already included"""
        if name not in self.name_to_col:
            self.name_to_col[name] = len(self.name_to_col)
            for row in self.table:
                row.append('')
    
    def add_cell(self, name, value):
        """Add value to named column in the current row"""
        if value:
            self.add_column(name)
            self.table[-1][self.name_to_col[name]] = value.strip().replace("\r\n", " ")
            
    def new_row(self):
        """Create a new row and make it current"""
        self.table.append([''] * len(self.name_to_col))

    def header(self):
        """Gather discovered column names into a header list"""
        idx_1 = operator.itemgetter(1)
        return [name for name, _ in sorted(self.name_to_col.items(), key=idx_1)]

    def prepend_header(self):
        """Gather discovered column names into a header and
        prepend it to the list"""
        self.table.insert(0, self.header())

def events_to_table(elem):
    """ Builds table from <family> child elements and their contained alarms and
    logs."""
    ns = {"f":"urn:nortel:namespaces:mcp:faults"}
    table = ExpandingTable()
    for family in elem.xpath("f:family", namespaces=ns):
        longName = family.get("longName")
        shortName = family.get("shortName")
        for event in family.xpath("*/*[f:eventType]", namespaces=ns):
            table.new_row()
            table.add_cell("longName", longName)
            table.add_cell("shortName", shortName)
            for cell in event:
                tag = QName(cell.tag).localname
                if tag == "severities":
                    tag = "severity"
                    text = ",".join(severity.text for severity in cell.xpath("*"))
                    print("severities", repr(text))
                else:
                    text = cell.text
                table.add_cell(tag, text)
    table.prepend_header()
    return table.table
    
def main(filename):
    doc = lxml.etree.parse(filename)
    table = events_to_table(doc.getroot())
    with open('test.csv', 'w', newline='', encoding='utf-8') as fileobj:
        csv.writer(fileobj).writerows(table)

main('test.xml')
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/64323393

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档