文章/答案/技术大牛

发布

社区首页 >问答首页 >在Python中使用lxml获取<img>的title-attribute

问在Python中使用lxml获取<img>的title-attribute
EN

Stack Overflow用户

提问于 2011-07-09 01:38:53

回答 2查看 2.9K关注 0票数 0

我想从this网站上使用Python提取onel-iner-text。HTML中的消息如下所示：

<div class="olh_message"> 
    <p>foobarbaz <img src="/static/emoticons/support-our-fruits.gif" title=":necta:" /></p> 
</div>

到目前为止，我的代码如下所示：

import lxml.html
url = "http://www.scenemusic.net/demovibes/oneliner/"
xpath = "//div[@class='olh_message']/p"
tree = lxml.html.parse(url)
texts = tree.xpath(xpath)
texts = [text.text_content() for text in texts]
print(texts)

然而，现在我只得到了标题，但是我也想得到img的foobarbaz参数，所以在这个例子中是foobarbaz :necta:。看起来我需要lxml的DOM解析器来做这件事，但是我不知道怎么做。有人能给我个提示吗？

提前感谢！

lxml

python

dom

xpath

html-parsing

回答 2

Stack Overflow用户

发布于 2011-07-09 02:06:15

尝尝这个

  import lxml.html
  url = "http://www.scenemusic.net/demovibes/oneliner/"
  parser = lxml.etree.HTMLParser()
  tree = lxml.etree.parse(url, parser)
  texts = tree.xpath("//div[@class='olh_message']/p/img/@title")

票数 1

Stack Overflow用户

发布于 2011-07-09 09:59:27

使用的

//div[@class='olh_message']/p/node()

他选择任何p元素的所有子节点(元素、文本节点、PI和注释节点)，该元素是任何div元素的子元素，其class属性为'olh_message'。

以XSLT作为XPath主机的验证

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/">
     <xsl:copy-of select="//div[@class='olh_message']/p/node()"/>
 </xsl:template>
</xsl:stylesheet>

在以下XML文档上应用此转换时为

<div class="olh_message">
    <p>foobarbaz 
        <img src="/static/emoticons/support-our-fruits.gif" title=":necta:" />
    </p>
</div>

所需的、正确的结果将生成(显示所需的节点已被XPath表达式选中)：

foobarbaz 
        <img src="/static/emoticons/support-our-fruits.gif" title=":necta:"/>

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/6628231

复制

相似问题

问在Python中使用lxml获取<img>的title-attribute
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python中使用lxml获取<img>的title-attributeEN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python中使用lxml获取<img>的title-attribute
EN