我想输出一个页面主要元素的列表。摘要如下所示。我需要一种方法,只获取文本标签之间的文本使用python。如果成功,我希望以下内容的输出为:
数学,微分方程,几何
<language>english</language>
<concepts>
<concept>
<text>Mathematics</text>
<relevance>0.988094</relevance>
<dbpedia>http://dbpedia.org/resource/Mathematics</dbpedia>
<freebase>http://rdf.freebase.com/ns/m.04rjg</freebase>
<opencyc>http://sw.opencyc.org/concept/Mx4rvVjHd5wpEbGdrcN5Y29ycA</opencyc>
</concept>
<concept>
<text>Differential equation</text>
<relevance>0.729187</relevance>
<dbpedia>http://dbpedia.org/resource/Differential_equation</dbpedia>
<freebase>http://rdf.freebase.com/ns/m.050fdl</freebase>
<opencyc>http://sw.opencyc.org/concept/Mx4rvXXRFJwpEbGdrcN5Y29ycA</opencyc>
</concept>
<concept>
<text>Geometry</text>
<relevance>0.677052</relevance>
<dbpedia>http://dbpedia.org/resource/Geometry</dbpedia>
<freebase>http://rdf.freebase.com/ns/m.025x7g_</freebase>
<opencyc>http://sw.opencyc.org/concept/Mx4rvgcAf5wpEbGdrcN5Y29ycA</opencyc>
</concept>
<concept>发布于 2015-10-28 21:54:39
您应该看看一些xml解析器。很容易买到。例如:
from xml.etree import ElementTree
doc = ElementTree.fromstring(xmlstring)
for tag in doc.findall('.//text'):
print(tag.text)https://stackoverflow.com/questions/33401246
复制相似问题