文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用lxml遍历GraphML文件

问如何使用lxml遍历GraphML文件
EN

Stack Overflow用户

提问于 2012-04-18 16:40:51

回答 1查看 1.9K关注 0票数 2

我有以下GraphML文件'mygraph.gml‘，我想用一个简单的python脚本来解析它：

这是一个简单的图，包含2个节点"node0“、"node1”和它们之间的一条边

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
         http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
  <key id="name" for="node" attr.name="name" attr.type="string"/>
  <key id="weight" for="edge" attr.name="weight" attr.type="double"/>
  <graph id="G" edgedefault="directed">
    <node id="n0">
      <data key="name">node1</data>
    </node>
    <node id="n1">
      <data key="name">node2</data>
    </node>
<edge source="n1" target="n0">
  <data key="weight">1</data>
</edge>
  </graph>
</graphml>

这表示一个具有两个节点n0和n1的图，它们之间的边权重为1。我想用python解析这个结构。

我在lxml的帮助下写了一个脚本(我需要使用它，因为数据集比这个简单的例子大得多，超过10^5个节点，python minidom太慢了)

import lxml.etree as et

tree = et.parse('mygraph.gml')

root = tree.getroot()

graphml = {
"graph": "{http://graphml.graphdrawing.org/xmlns}graph",
"node": "{http://graphml.graphdrawing.org/xmlns}node",
"edge": "{http://graphml.graphdrawing.org/xmlns}edge",
"data": "{http://graphml.graphdrawing.org/xmlns}data",
"label": "{http://graphml.graphdrawing.org/xmlns}data[@key='label']",
"x": "{http://graphml.graphdrawing.org/xmlns}data[@key='x']",
"y": "{http://graphml.graphdrawing.org/xmlns}data[@key='y']",
"size": "{http://graphml.graphdrawing.org/xmlns}data[@key='size']",
"r": "{http://graphml.graphdrawing.org/xmlns}data[@key='r']",
"g": "{http://graphml.graphdrawing.org/xmlns}data[@key='g']",
"b": "{http://graphml.graphdrawing.org/xmlns}data[@key='b']",
"weight": "{http://graphml.graphdrawing.org/xmlns}data[@key='weight']",
"edgeid": "{http://graphml.graphdrawing.org/xmlns}data[@key='edgeid']"
}

graph = tree.find(graphml.get("graph"))
nodes = graph.findall(graphml.get("node"))
edges = graph.findall(graphml.get("edge"))

这个脚本正确地获取了节点和边，因此我可以简单地遍历它们

for n in nodes:
    print n.attrib

或者类似地在边缘上：

for e in edges:
    print (e.attrib['source'], e.attrib['target'])

但是我真的不能理解如何获取边或节点的"data“标记，以便打印边权重和节点标记"name”。

这对我不起作用：

weights = graph.findall(graphml.get("weight"))

最后一个列表总是空的。为什么？我遗漏了一些东西，但我不明白是什么。

python

lxml

loops

graphml

回答 1

Stack Overflow用户

回答已采纳

发布于 2012-04-18 17:49:02

您不能一遍完成，但对于找到的每个节点，您可以使用data的键/值构建一个字典：

graph = tree.find(graphml.get("graph"))
nodes = graph.findall(graphml.get("node"))
edges = graph.findall(graphml.get("edge"))

for node in nodes + edges:
    attribs = {}
    for data in node.findall(graphml.get('data')):
        attribs[data.get('key')] = data.text
    print 'Node', node, 'have', attribs

它给出了结果：

Node <Element {http://graphml.graphdrawing.org/xmlns}node at 0x7ff053d3e5a0> have {'name': 'node1'}
Node <Element {http://graphml.graphdrawing.org/xmlns}node at 0x7ff053d3e5f0> have {'name': 'node2'}
Node <Element {http://graphml.graphdrawing.org/xmlns}edge at 0x7ff053d3e640> have {'weight': '1'}

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/10205811

复制

相似问题

问如何使用lxml遍历GraphML文件
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用lxml遍历GraphML文件EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用lxml遍历GraphML文件
EN