首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >基于xml标记标题获取文本

基于xml标记标题获取文本
EN

Stack Overflow用户
提问于 2019-12-03 13:15:19
回答 3查看 587关注 0票数 0

设置

我对xml和ubl xml很陌生。

试图使用.xml将下面的ElementTree发票读入Python。

代码语言:javascript
复制
<?xml version="1.0" encoding="UTF-8"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2&#xA;http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
  <cbc:UBLVersionID>2.1</cbc:UBLVersionID>
  <cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
  <cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
  <cbc:ID>201909638</cbc:ID>
  <cbc:IssueDate>2019-11-01</cbc:IssueDate>
  <cbc:InvoiceTypeCode listAgencyID="6" listID="UNCL1001">380</cbc:InvoiceTypeCode>
  <cbc:DocumentCurrencyCode listAgencyID="6" listID="ISO 4217 Alpha">EUR</cbc:DocumentCurrencyCode>
  <cac:OrderReference>
  # other stuff
</Invoice>

如果运行root[4].text,将得到以字符串形式返回的IssueDate标记的文本,即'2019-11-01'

问题

我想获得基于标签标题的文本。

  • root.find('IssueDate').text
  • root.find('cbc:IssueDate').text

AttributeError: 'NoneType' object has no attribute 'text'

问题

如何根据标签标题IssueDate获取文本?

更普遍地说,如何根据标签的标题获取任何标签的文本?

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2019-12-03 19:21:17

代码语言:javascript
复制
import xml.etree.ElementTree as ET   

xml_string="""<?xml version="1.0" encoding="UTF-8"?>
    <Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2&#xA;http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
      <cbc:UBLVersionID>2.1</cbc:UBLVersionID>
      <cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
      <cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
      <cbc:ID>201909638</cbc:ID>
      <cbc:IssueDate>2019-11-01</cbc:IssueDate>
      <cbc:InvoiceTypeCode listAgencyID="6" listID="UNCL1001">380</cbc:InvoiceTypeCode>
      <cbc:DocumentCurrencyCode listAgencyID="6" listID="ISO 4217 Alpha">EUR</cbc:DocumentCurrencyCode>
      <cac:OrderReference>ABC</cac:OrderReference>
    </Invoice>"""

root = ET.fromstring(xml_string)

这里我使用字符串作为输入,您也使用XML文件作为输入。现在,要首先获得基于标签标题的文本,您需要知道标签的名称是什么。

代码语言:javascript
复制
for child in root:
    print(child.tag, child.attrib)

输出:

代码语言:javascript
复制
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}UBLVersionID {}
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}CustomizationID {}
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}ProfileID {}
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}ID {}
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate {}
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}InvoiceTypeCode {'listAgencyID': '6', 'listID': 'UNCL1001'}
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}DocumentCurrencyCode {'listAgencyID': '6', 'listID': 'ISO 4217 Alpha'}
{urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2}OrderReference {}

您可以看到,您的逻辑是正确的,以找到文本,但您提供了错误的标题。由于发票属性,在这里我们无法使用'cbc:IssueDate''IssueDate'直接找到文本

如果你用过

代码语言:javascript
复制
root.find("{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate").text

输出:

代码语言:javascript
复制
'2019-11-01'

这里,由于标签标题中的"urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" cbc,在IssueDate前面添加了。如果不是这样的话,"urn:oasis:names:specification:ubl:schema:xsd:Invoice-2"就会被添加到前面。

我希望这能回答你的问题。

票数 0
EN

Stack Overflow用户

发布于 2019-12-16 21:10:05

您可以使用BeautifulSoup

代码语言:javascript
复制
from bs4 import BeautifulSoup as BS4

xml_test = """<?xml version="1.0" encoding="UTF-8"?>
    <Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2&#xA;http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
      <cbc:UBLVersionID>2.1</cbc:UBLVersionID>
      <cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
      <cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
      <cbc:ID>201909638</cbc:ID>
      <cbc:IssueDate>2019-11-01</cbc:IssueDate>
      <cbc:InvoiceTypeCode listAgencyID="6" listID="UNCL1001">380</cbc:InvoiceTypeCode>
      <cbc:DocumentCurrencyCode listAgencyID="6" listID="ISO 4217 Alpha">EUR</cbc:DocumentCurrencyCode>
      <cac:OrderReference>ABC</cac:OrderReference>
    </Invoice>"""

soup = BS4(xml_test)

tag = soup.find("cbc:issuedate")

print(tag.text)

结果将是

代码语言:javascript
复制
2019-11-01

如果您有许多issue dates,您可以使用

代码语言:javascript
复制
tags = soup.findAll("cbc:issuedate")
for tag in tags:
    print(tag.text)

我希望它能帮上忙

票数 0
EN

Stack Overflow用户

发布于 2019-12-17 02:03:33

您也可以使用SimplifiedDoc。

代码语言:javascript
复制
from simplified_scrapy.simplified_doc import SimplifiedDoc 
html = '''
<?xml version="1.0" encoding="UTF-8"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2&#xA;http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
  <cbc:UBLVersionID>2.1</cbc:UBLVersionID>
  <cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
  <cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
  <cbc:ID>201909638</cbc:ID>
  <cbc:IssueDate>2019-11-01</cbc:IssueDate>
  <cbc:InvoiceTypeCode listAgencyID="6" listID="UNCL1001">380</cbc:InvoiceTypeCode>
  <cbc:DocumentCurrencyCode listAgencyID="6" listID="ISO 4217 Alpha">EUR</cbc:DocumentCurrencyCode>
  <cac:OrderReference>
  # other stuff
</Invoice>
'''
doc = SimplifiedDoc(html)
print (doc.getElementByTag('cbc:IssueDate').text) # get one
lst = doc.getElementByTag('Invoice').getChildren() # get all
for item in lst:
  print (item.tag,item.text)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59157987

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档