我正在使用aws的alexa,但是我发现很难解析结果以得到我想要的结果。
alexa返回一个对象树<type 'lxml.etree._ElementTree'>
我用这段代码打印树
from lxml import etree
root = tree.getroot()
print etree.tostring(root)下面是xml
<aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"><aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11"><aws:OperationRequest><aws:RequestId>ccf3f263-ab76-ab63-db99-244666044e85</aws:RequestId></aws:OperationRequest><aws:UrlInfoResult><aws:Alexa>
<aws:ContentData>
<aws:DataUrl type="canonical">google.com/</aws:DataUrl>
<aws:SiteData>
<aws:Title>Google</aws:Title>
<aws:Description>Enables users to search the world's information, including webpages, images, and videos. Offers unique features and search technology.</aws:Description>
<aws:OnlineSince>15-Sep-1997</aws:OnlineSince>
</aws:SiteData>
<aws:LinksInCount>3453627</aws:LinksInCount>
</aws:ContentData>
<aws:TrafficData>
<aws:DataUrl type="canonical">google.com/</aws:DataUrl>
<aws:Rank>1</aws:Rank>
</aws:TrafficData>
</aws:Alexa></aws:UrlInfoResult><aws:ResponseStatus xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"><aws:StatusCode>Success</aws:StatusCode></aws:ResponseStatus></aws:Response></aws:UrlInfoResponse>我使用root.find('LinksInCount').text获取元素的值,但它不起作用。
我想知道如何获得文本3453627 of aws:LinksInCount
发布于 2014-06-24 10:06:07
你遇到了两个挑战:
两个不同名称空间的具有可重用前缀的XML文档
您可以看到"aws:"前缀,但它用于两个不同的名称空间:
xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/"
xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11"在XML中使用相同的命名空间前缀是完全合法的。规则是,后者是有效的。
xmlstr = """
<?xml version="1.0"?>
<aws:UrlInfoResponse xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
<aws:Response xmlns:aws="http://awis.amazonaws.com/doc/2005-07-11">
<aws:OperationRequest>
<aws:RequestId>ccf3f263-ab76-ab63-db99-244666044e85</aws:RequestId>
</aws:OperationRequest>
<aws:UrlInfoResult>
<aws:Alexa>
<aws:ContentData>
<aws:DataUrl type="canonical">google.com/</aws:DataUrl>
<aws:SiteData>
<aws:Title>Google</aws:Title>
<aws:Description>Enables users to search the world's information, including webpages, images, and videos. Offers unique features and search technology.</aws:Description>
<aws:OnlineSince>15-Sep-1997</aws:OnlineSince>
</aws:SiteData>
<aws:LinksInCount>3453627</aws:LinksInCount>
</aws:ContentData>
<aws:TrafficData>
<aws:DataUrl type="canonical">google.com/</aws:DataUrl>
<aws:Rank>1</aws:Rank>
</aws:TrafficData>
</aws:Alexa>
</aws:UrlInfoResult>
<aws:ResponseStatus xmlns:aws="http://alexa.amazonaws.com/doc/2005-10-05/">
<aws:StatusCode>Success</aws:StatusCode>
</aws:ResponseStatus>
</aws:Response>
</aws:UrlInfoResponse>
"""下一个挑战是,如何搜索命名空间元素。
我更喜欢使用xpath,对于它,您可以在xpath表达式中使用您喜欢的任何名称空间,但是您必须告诉xpath调用这些前缀的含义。这是由namespaces字典完成的:
from lxml import etree
doc = etree.fromstring(xmlstr.strip())
namespaces = {"aws": "http://awis.amazonaws.com/doc/2005-07-11"}
texts = doc.xpath("//aws:LinksInCount/text()", namespaces=namespaces)
print texts[0]https://stackoverflow.com/questions/24382718
复制相似问题