首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用大熊猫数据替换XML中的元素

用大熊猫数据替换XML中的元素
EN

Stack Overflow用户
提问于 2020-04-24 15:28:21
回答 2查看 234关注 0票数 0

我使用lxml来读取我的xml文件:

代码语言:javascript
复制
tree = etree.parse(r'C:\Users\xxx\Desktop\misc work\xmledit\SalesTransactionCustom.xml')

并获得以下xml文件:

代码语言:javascript
复制
<?xml version="1.0" encoding="UTF-8"?>
<ProcessSalesTransactionCustom xmlns="http://schema.xxxx.com/xxxxx/2" releaseID="9.2">
  <ApplicationArea>
    <Sender>
      <LogicalID>xxxxxx.file.syncxxxxx5salesinvoice</LogicalID>
      <ComponentID>External</ComponentID>
      <ConfirmationCode>OnError</ConfirmationCode>
    </Sender>
    <CreationDateTime>2020-04-16T14:50:26.976Z</CreationDateTime>
    <BODID>xxxx-nid:xxxxx:1001::Default_1001#320000:?SalesTransactionCustom&amp;verb=Process</BODID>
  </ApplicationArea>
  <DataArea>
    <Process>
      <TenantID>xxx</TenantID>
      <AccountingEntityID>4710</AccountingEntityID>
      <LocationID>S_4710</LocationID>
      <ActionCriteria>
        <ActionExpression actionCode="Add"/>
      </ActionCriteria>
    </Process>
    <SalesTransactionCustom>
      <FinancialBatch>
        <TransactionDate>2019-09-27T00:00:00</TransactionDate>
        <BatchReference>KUKS_20190928052427</BatchReference>
      </FinancialBatch>
      <TransactionHeader>
        <TransactionType>HEI</TransactionType>
        <SalesInvoice>
          <Invoice>19001160</Invoice>
          <BusinessPartner>417B00</BusinessPartner>
          <DocumentDate>2019-09-27T00:00:00</DocumentDate>
          <DueDate>2019-11-20T00:00:00</DueDate>
          <Amount>152248.80</Amount>
          <Currency>EUR</Currency>
          <TaxCountry>DK</TaxCountry>
          <TaxCode>BESIT</TaxCode>
          <NonFinalizedTransaction>
            <TransactionReference>417B00 PC210LCI-11</TransactionReference>
            <LedgerAccount>50000400</LedgerAccount>
            <Dimension1>100</Dimension1>
            <Dimension2>KUK</Dimension2>
            <Dimension3/>
            <Dimension4/>
            <Dimension5/>
            <Dimension6/>
            <Dimension7/>
            <Dimension8/>
            <TaxAmount>0.00</TaxAmount>
            <DebitCreditFlag>credit</DebitCreditFlag>
            <Amount>152248.80</Amount>
          </NonFinalizedTransaction>
        </SalesInvoice>
      </TransactionHeader>
      <TransactionHeader>
        <TransactionType>HEI</TransactionType>
        <SalesInvoice>
          <Invoice>19001161</Invoice>
          <BusinessPartner>412600</BusinessPartner>
          <DocumentDate>2019-09-27T00:00:00</DocumentDate>
          <DueDate>2019-11-20T00:00:00</DueDate>
          <Amount>113848.17</Amount>
          <Currency>EUR</Currency>
          <TaxCountry>AT</TaxCountry>
          <TaxCode>GBSI</TaxCode>
          <NonFinalizedTransaction>
            <TransactionReference>412600 PC210NLC-11</TransactionReference>
            <LedgerAccount>50000400</LedgerAccount>
            <Dimension1>100</Dimension1>
            <Dimension2>KUK</Dimension2>
            <Dimension3/>
            <Dimension4/>
            <Dimension5/>
            <Dimension6/>
            <Dimension7/>
            <Dimension8/>
            <TaxAmount>0.00</TaxAmount>
            <DebitCreditFlag>credit</DebitCreditFlag>
            <Amount>113848.17</Amount>
          </NonFinalizedTransaction>
        </SalesInvoice>
      </TransactionHeader>
    </SalesTransactionCustom>
  </DataArea>
</ProcessSalesTransactionCustom>

我有一只熊猫数据,就像(这里的第一行是列名):

代码语言:javascript
复制
Tag             Old Value   New Value
BusinessPartner 417B00      BPE000104
BusinessPartner 412600      BPE000153
LedgerAccount   50000400    108092200

我想替换xml中元素的属性,并参考这个熊猫数据。我希望能够找到标记和旧值的组合,并用新值替换属性。我还需要能够以XML的形式将编辑后的文本写回磁盘。

我怎么能用lxml和熊猫来做这件事呢?

提前谢谢你

编辑:这是感谢@Partha Mandal的代码

代码语言:javascript
复制
import pandas as pd
from lxml import etree

df=pd.read_excel("Sample.xlsx")
df.columns=['Tag','Old','New']
df['Old'] = df['Old'].astype(str)
df['New'] = df['New'].astype(str)

parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(r'C:\Users\xxx\Desktop\misc work\xmledit\testxml2.xml',parser)
string = etree.tostring(tree)
string = bytes.decode(string)

tag = df.Tag; old = df.Old; new = df.New

for i in range(len(tag)):
   string = string.replace("<"+tag[i]+">"+old[i]+"</"+tag[i]+">","<"+tag[i]+">"+new[i]+"</"+tag[i]+">")

string=str.encode(string)

root = etree.fromstring(string)
my_tree = etree.ElementTree(root)
with open('testxml2.xml', 'wb') as f:
    f.write(etree.tostring(my_tree))
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-04-24 15:57:48

为什么不将XML读入字符串并执行str.replace

代码语言:javascript
复制
tag = df.Tag; old = df.Old; new = df.New

for i in range(len(tag)):
   _str = _str.replace("<"+tag[i]+">"+old[i]+"</"+tag[i]+">","<"+tag[i]+">"+new[i]+"</"+tag[i]+">")
票数 1
EN

Stack Overflow用户

发布于 2020-04-24 21:06:15

因为您使用lxml,所以考虑XSLT,这是一种特殊的用途语言,用于将XML文件转换为不同的XML,并支持从顶层(如Python )传递参数。因此,在跨数据帧记录的循环中集成参数化:

唯一的挑战是将所有唯一的标记值硬编码到XSLT的第二个模板匹配中(管道正常后换行):

代码语言:javascript
复制
doc:BusinessPartner|doc:LedgerAccount

你可以用它

代码语言:javascript
复制
"|".join(['doc:'+ val for val in df['Tag'].unique()])

"|\n".join(['doc:'+ val for val in df['Tag'].unique()])

XSLT (另存为.xsl,一个特殊的.xml文件)

代码语言:javascript
复制
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                              xmlns:doc="http://schema.xxxx.com/xxxxx/2">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <!-- PARAMETERS -->
  <xsl:param name="tag" />
  <xsl:param name="old_value" />
  <xsl:param name="new_value" />

  <!-- IDENTITY TRANSFORM -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- CONDITIONALL ASSIGN PARAMS --> 
  <xsl:template match="doc:BusinessPartner|doc:LedgerAccount">
    <xsl:choose>
        <xsl:when test = "text() = $old_value">
            <xsl:copy>
                <xsl:value-of select="$new_value"/>
            </xsl:copy>
        </xsl:when>
        <xsl:otherwise>
            <xsl:copy>
                <xsl:value-of select="text()"/>
            </xsl:copy>
        </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

Python

代码语言:javascript
复制
import pandas as pd
import lxml.etree as et

df = pd.read_csv(...)

# LOAD XML AND XSL SCRIPT
xml = et.parse('Input.xml')
xsl = et.parse('Script.xsl')
transform = et.XSLT(xsl)

# PASS PARAMETER TO XSLT
df_list = df.to_dict('records')

for v in df_list:   
    result = transform(xml, tag = et.XSLT.strparam(v['Tag']), 
                            old_value = et.XSLT.strparam(v['Old Value']), 
                            new_value = et.XSLT.strparam(v['New Value']))

    xml = result

# SAVE TO NEW XML
with open("Output.xml", 'wb') as f:
    f.write(result)

XML输出

代码语言:javascript
复制
<?xml version="1.0"?>
<ProcessSalesTransactionCustom xmlns="http://schema.xxxx.com/xxxxx/2" releaseID="9.2">
  <ApplicationArea>
    <Sender>
      <LogicalID>xxxxxx.file.syncxxxxx5salesinvoice</LogicalID>
      <ComponentID>External</ComponentID>
      <ConfirmationCode>OnError</ConfirmationCode>
    </Sender>
    <CreationDateTime>2020-04-16T14:50:26.976Z</CreationDateTime>
    <BODID>xxxx-nid:xxxxx:1001::Default_1001#320000:?SalesTransactionCustom&amp;verb=Process</BODID>
  </ApplicationArea>
  <DataArea>
    <Process>
      <TenantID>infor</TenantID>
      <AccountingEntityID>4710</AccountingEntityID>
      <LocationID>S_4710</LocationID>
      <ActionCriteria>
        <ActionExpression actionCode="Add"/>
      </ActionCriteria>
    </Process>
    <SalesTransactionCustom>
      <FinancialBatch>
        <TransactionDate>2019-09-27T00:00:00</TransactionDate>
        <BatchReference>KUKS_20190928052427</BatchReference>
      </FinancialBatch>
      <TransactionHeader>
        <TransactionType>HEI</TransactionType>
        <SalesInvoice>
          <Invoice>19001160</Invoice>
          <BusinessPartner>BPE000104</BusinessPartner>
          <DocumentDate>2019-09-27T00:00:00</DocumentDate>
          <DueDate>2019-11-20T00:00:00</DueDate>
          <Amount>152248.80</Amount>
          <Currency>EUR</Currency>
          <TaxCountry>DK</TaxCountry>
          <TaxCode>BESIT</TaxCode>
          <NonFinalizedTransaction>
            <TransactionReference>417B00 PC210LCI-11</TransactionReference>
            <LedgerAccount>108092200</LedgerAccount>
            <Dimension1>100</Dimension1>
            <Dimension2>KUK</Dimension2>
            <Dimension3/>
            <Dimension4/>
            <Dimension5/>
            <Dimension6/>
            <Dimension7/>
            <Dimension8/>
            <TaxAmount>0.00</TaxAmount>
            <DebitCreditFlag>credit</DebitCreditFlag>
            <Amount>152248.80</Amount>
          </NonFinalizedTransaction>
        </SalesInvoice>
      </TransactionHeader>
      <TransactionHeader>
        <TransactionType>HEI</TransactionType>
        <SalesInvoice>
          <Invoice>19001161</Invoice>
          <BusinessPartner>BPE000153</BusinessPartner>
          <DocumentDate>2019-09-27T00:00:00</DocumentDate>
          <DueDate>2019-11-20T00:00:00</DueDate>
          <Amount>113848.17</Amount>
          <Currency>EUR</Currency>
          <TaxCountry>AT</TaxCountry>
          <TaxCode>GBSI</TaxCode>
          <NonFinalizedTransaction>
            <TransactionReference>412600 PC210NLC-11</TransactionReference>
            <LedgerAccount>108092200</LedgerAccount>
            <Dimension1>100</Dimension1>
            <Dimension2>KUK</Dimension2>
            <Dimension3/>
            <Dimension4/>
            <Dimension5/>
            <Dimension6/>
            <Dimension7/>
            <Dimension8/>
            <TaxAmount>0.00</TaxAmount>
            <DebitCreditFlag>credit</DebitCreditFlag>
            <Amount>113848.17</Amount>
          </NonFinalizedTransaction>
        </SalesInvoice>
      </TransactionHeader>
    </SalesTransactionCustom>
  </DataArea>
</ProcessSalesTransactionCustom>
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61411916

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档