设置
我对xml和ubl xml很陌生。
试图使用.xml将下面的ElementTree发票读入Python。
<?xml version="1.0" encoding="UTF-8"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2
http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
<cbc:UBLVersionID>2.1</cbc:UBLVersionID>
<cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
<cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
<cbc:ID>201909638</cbc:ID>
<cbc:IssueDate>2019-11-01</cbc:IssueDate>
<cbc:InvoiceTypeCode listAgencyID="6" listID="UNCL1001">380</cbc:InvoiceTypeCode>
<cbc:DocumentCurrencyCode listAgencyID="6" listID="ISO 4217 Alpha">EUR</cbc:DocumentCurrencyCode>
<cac:OrderReference>
# other stuff
</Invoice>如果运行root[4].text,将得到以字符串形式返回的IssueDate标记的文本,即'2019-11-01'。
问题
我想获得基于标签标题的文本。
root.find('IssueDate').textroot.find('cbc:IssueDate').text给AttributeError: 'NoneType' object has no attribute 'text'。
问题
如何根据标签标题IssueDate获取文本?
更普遍地说,如何根据标签的标题获取任何标签的文本?
发布于 2019-12-03 19:21:17
import xml.etree.ElementTree as ET
xml_string="""<?xml version="1.0" encoding="UTF-8"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2
http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
<cbc:UBLVersionID>2.1</cbc:UBLVersionID>
<cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
<cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
<cbc:ID>201909638</cbc:ID>
<cbc:IssueDate>2019-11-01</cbc:IssueDate>
<cbc:InvoiceTypeCode listAgencyID="6" listID="UNCL1001">380</cbc:InvoiceTypeCode>
<cbc:DocumentCurrencyCode listAgencyID="6" listID="ISO 4217 Alpha">EUR</cbc:DocumentCurrencyCode>
<cac:OrderReference>ABC</cac:OrderReference>
</Invoice>"""
root = ET.fromstring(xml_string)这里我使用字符串作为输入,您也使用XML文件作为输入。现在,要首先获得基于标签标题的文本,您需要知道标签的名称是什么。
for child in root:
print(child.tag, child.attrib)输出:
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}UBLVersionID {}
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}CustomizationID {}
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}ProfileID {}
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}ID {}
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate {}
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}InvoiceTypeCode {'listAgencyID': '6', 'listID': 'UNCL1001'}
{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}DocumentCurrencyCode {'listAgencyID': '6', 'listID': 'ISO 4217 Alpha'}
{urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2}OrderReference {}您可以看到,您的逻辑是正确的,以找到文本,但您提供了错误的标题。由于发票属性,在这里我们无法使用'cbc:IssueDate'或'IssueDate'直接找到文本
如果你用过
root.find("{urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2}IssueDate").text输出:
'2019-11-01'这里,由于标签标题中的"urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" cbc,在IssueDate前面添加了。如果不是这样的话,"urn:oasis:names:specification:ubl:schema:xsd:Invoice-2"就会被添加到前面。
我希望这能回答你的问题。
发布于 2019-12-16 21:10:05
您可以使用BeautifulSoup
from bs4 import BeautifulSoup as BS4
xml_test = """<?xml version="1.0" encoding="UTF-8"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2
http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
<cbc:UBLVersionID>2.1</cbc:UBLVersionID>
<cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
<cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
<cbc:ID>201909638</cbc:ID>
<cbc:IssueDate>2019-11-01</cbc:IssueDate>
<cbc:InvoiceTypeCode listAgencyID="6" listID="UNCL1001">380</cbc:InvoiceTypeCode>
<cbc:DocumentCurrencyCode listAgencyID="6" listID="ISO 4217 Alpha">EUR</cbc:DocumentCurrencyCode>
<cac:OrderReference>ABC</cac:OrderReference>
</Invoice>"""
soup = BS4(xml_test)
tag = soup.find("cbc:issuedate")
print(tag.text)结果将是
2019-11-01如果您有许多issue dates,您可以使用
tags = soup.findAll("cbc:issuedate")
for tag in tags:
print(tag.text)我希望它能帮上忙
发布于 2019-12-17 02:03:33
您也可以使用SimplifiedDoc。
from simplified_scrapy.simplified_doc import SimplifiedDoc
html = '''
<?xml version="1.0" encoding="UTF-8"?>
<Invoice xmlns="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2
http://docs.oasis-open.org/ubl/os-UBL-2.1/xsd/maindoc/UBL-Invoice-2.1.xsd">
<cbc:UBLVersionID>2.1</cbc:UBLVersionID>
<cbc:CustomizationID>urn:www.cenbii.eu:transaction:biitrns010:ver2.0:extended:urn:www.peppol.eu:bis:peppol4a:ver2.0:extended:urn:www.simplerinvoicing.org:si:si-ubl:ver1.1.x</cbc:CustomizationID>
<cbc:ProfileID>urn:www.cenbii.eu:profile:bii04:ver2.0</cbc:ProfileID>
<cbc:ID>201909638</cbc:ID>
<cbc:IssueDate>2019-11-01</cbc:IssueDate>
<cbc:InvoiceTypeCode listAgencyID="6" listID="UNCL1001">380</cbc:InvoiceTypeCode>
<cbc:DocumentCurrencyCode listAgencyID="6" listID="ISO 4217 Alpha">EUR</cbc:DocumentCurrencyCode>
<cac:OrderReference>
# other stuff
</Invoice>
'''
doc = SimplifiedDoc(html)
print (doc.getElementByTag('cbc:IssueDate').text) # get one
lst = doc.getElementByTag('Invoice').getChildren() # get all
for item in lst:
print (item.tag,item.text)https://stackoverflow.com/questions/59157987
复制相似问题