文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在XML文件中解析VCARD

问如何在XML文件中解析VCARD
EN

Stack Overflow用户

提问于 2019-07-24 12:07:58

回答 1查看 274关注 0票数 1

我试图解析一个XML文件，其中包含一些VCARD。我需要信息: FN，注意(SIREN和A)，并将它们打印为FN，SIREN_A。如果描述中的字符串仅等于"diviseur“，我也希望将它们添加到列表中。

我尝试过不同的东西(vobject，finditer)，但它们都不起作用。对于我的解析器来说，我使用的是库xml.etree.ElementTree和熊猫，这通常会导致一些不兼容。

代码python：

import xml.etree.ElementTree as ET
import vobject
newlist=[]
data=[]
data.append(newlist)
diviseur=[]
tree=ET.parse('test_oc.xml')
root=tree.getroot()
newlist=[]
for lifeCycle in root.findall('{http://ltsc.ieee.org/xsd/LOM}lifeCycle'):
    for contribute in lifeCycle.findall('{http://ltsc.ieee.org/xsd/LOM}contribute'):
        for entity in  contribute.findall('{http://ltsc.ieee.org/xsd/LOM}entity'):
            vcard = vobject.readOne(entity)
            siren = vcard.contents['note'].value,":",vcard.contents['fn'].value
            print ('siren',siren.text)
    for date in contribute.findall('{http://ltsc.ieee.org/xsd/LOM}date'):
        for description in date.findall('{http://ltsc.ieee.org/xsd/LOM}description'):                       
            entite=description.find('{http://ltsc.ieee.org/xsd/LOM}string')
            print ('Type entité:', entite.text)
            newlist.append(entite)
            j=0
            for j in range(len(entite)-1):
                if entite[j]=="diviseur":
                    diviseur.append(siren[j])
                    print('diviseur:', diviseur)
                    newlist.append(diviseur)
data.append(newlist)                    
print(data)

要解析的xml文件：

<?xml version="1.0" encoding="UTF-8"?>    
<lom:lom xmlns:lom="http://ltsc.ieee.org/xsd/LOM" xmlns:lomfr="http://www.lom-fr.fr/xsd/LOMFR"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ltsc.ieee.org/xsd/LOM">
    <lom:version uniqueElementName="version">
        <lom:string language="http://id.loc.gov/vocabulary/iso639-2/fre">V4.1</lom:string>
    </lom:version>
    <lom:lifeCycle uniqueElementName="lifeCycle">
        <lom:contribute>
            <lom:entity><![CDATA[ 
            BEGIN:VCARD
            VERSION:4.0
            FN:Cailler
            N:;Valérie;;Mr;
            ORG:Veoli
            NOTE:SIREN=203025106
            NOTE :ISNI=0000000000000000
            END:VCARD
            ]]></lom:entity>
            <lom:date uniqueElementName="date">
                <lom:dateTime uniqueElementName="dateTime">2019-07-10</lom:dateTime>
                <lom:description uniqueElementName="description">
                    <lom:string>departure</lom:string>
                </lom:description>
            </lom:date>
        </lom:contribute>
        <lom:contribute>
            <lom:entity><![CDATA[ 
            BEGIN:VCARD
            VERSION:4.0
            FN:Besnard
            N:;Ugo;;Mr;
            ORG:MG
            NOTE:SIREN=501 025 205
            NOTE :A=0000 0000
            END:VCARD
            ]]></lom:entity>
            <lom:date uniqueElementName="date">
                <lom:dateTime uniqueElementName="dateTime">2019-07-10</lom:dateTime>
                <lom:description uniqueElementName="description">
                    <lom:string>diviseur</lom:string>
                </lom:description>
            </lom:date>
        </lom:contribute>
    </lom:lifeCycle>
</lom:lom>

追溯(最近一次调用)：文件"parser_export_csv_V2.py"，第73行，在vcard =vobject.readOne(实体)文件"C:\Users\b\AppData\Local\Programs\Python\Python36-32\lib\site-packages\vobject\base.py"，行1156中，在readOne allowQP中)文件"C:\Users\b\AppData\Local\Programs\Python\Python36-32\lib\site-packages\vobject\base.py"，行1089，在readComponents中表示行，在getLogicalLines中为n文件"C:\Users\b\AppData\Local\Programs\Python\Python36-32\lib\site-packages\vobject\base.py"，行869，在getLogicalLines val = fp.read(-1) AttributeError中：“xml.etree.ElementTree.Element”对象没有属性“read”

vcf-vcard

python

xml

parsing

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-07-25 14:02:55

这里有一些问题。

entity是Element实例，vCard是纯文本数据格式。vobject.readOne()需要文本。
XML文件中的vCard属性旁边有多余的空格。
NOTE :ISNI=0000000000000000无效；它应该是NOTE:ISNI=0000000000000000 (删除空间)。
vcard.contents['note']是一个列表，没有value属性。

下面的代码可能不会产生您想要的结果，但我希望它能有所帮助：

import xml.etree.ElementTree as ET
import vobject

NS = {"lom": "http://ltsc.ieee.org/xsd/LOM"}

tree = ET.parse('test_oc.xml')

for contribute in tree.findall('.//lom:contribute', NS):
    desc_string = contribute.find('.//lom:string', NS)
    print(desc_string.text)

    entity = contribute.find('lom:entity', NS)
    txt = entity.text.replace(" ", "")  # Text with spaces removed
    vcard = vobject.readOne(txt)

    for p in vcard.contents["note"]:
        print(p.name, p.value)
    for p in vcard.contents["fn"]:
        print(p.name, p.value)

    print()

输出：

departure
NOTE SIREN=203025106
NOTE ISNI=0000000000000000
FN Cailler

diviseur
NOTE SIREN=501025205
NOTE A=00000000
FN Besnard

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/57182574

复制

相似问题

问如何在XML文件中解析VCARD
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在XML文件中解析VCARDEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在XML文件中解析VCARD
EN