我想检查一个OpenStreetMap xml文档(root),看看有两个不同属性值(ref)的两个孙子(nd)的子(way)的数量。下面是OSM xml文档的样子:
<osm version="0.6" generator="osmium/1.13.2">
...
<way id="654822858">
<nd ref="3311110418"/>
<nd ref="6340618164"/>
<nd ref="6135961734"/>
<nd ref="8197878242"/>
<tag k="highway" v="residential"/>
<tag k="name" v="Avenida Décima Cerrada Las Torres"/>
</way>
<way id="654822862">
<nd ref="6135961736"/>
<nd ref="6135961745"/>
<nd ref="6340618150"/>
<nd ref="8197878242"/>
<tag k="highway" v="residential"/>
</way>
...
</osm>我成功地在下面的代码中使用了ElementTree来检查使用和
startnode = "6135961736"
endnode = "6340618150"
len(root.findall("./way/nd/[@ref ='"+ startnode +"'].." and "./way/nd/[@ref ='"+ endnode +"'].."))问题是,这需要很长时间。我推断了需要检查的方法的数量(~397000),这需要9天。我想要一些帮助,为它找到更快的方法。
谢谢
发布于 2022-05-24 12:35:54
您能用完整的数据集测试和计时您在下面的代码中找到的另外两个方法吗?
from io import StringIO
from lxml import etree
import timeit
f = StringIO('''\
<osm version="0.6" generator="osmium/1.13.2">
...
<way id="654822858">
<nd ref="3311110418"/>
<nd ref="6340618164"/>
<nd ref="6135961734"/>
<nd ref="8197878242"/>
<tag k="highway" v="residential"/>
<tag k="name" v="Avenida Décima Cerrada Las Torres"/>
</way>
<way id="654822862">
<nd ref="6135961736"/>
<nd ref="6135961745"/>
<nd ref="6340618150"/>
<nd ref="8197878242"/>
<tag k="highway" v="residential"/>
</way>
...
</osm>
''')
tree = etree.parse(f)
startnode = "6135961736"
endnode = "6340618150"
print('Your method:')
%time len(tree.findall("./way/nd/[@ref ='"+ startnode +"'].." and "./way/nd/[@ref ='"+ endnode +"'].."))
print('\n')
print('XPATH method:')
%time len(tree.xpath('./way[ ./nd[contains(@ref, "'+ startnode +'")] and ./nd[contains(@ref, "'+ endnode +'")] ]'))
print('\n')
print('XPATH + f-string method:')
%time len(tree.xpath(f'./way[ ./nd[contains(@ref, "{startnode}")] and ./nd[contains(@ref, "{endnode}")] ]'))结果:
Your method:
CPU times: user 103 µs, sys: 0 ns, total: 103 µs
Wall time: 111 µs
XPATH method:
CPU times: user 98 µs, sys: 0 ns, total: 98 µs
Wall time: 103 µs
XPATH + f-string method:
CPU times: user 69 µs, sys: 3 µs, total: 72 µs
Wall time: 76.1 µs
1另一项结果是:
Your method:
CPU times: user 142 µs, sys: 0 ns, total: 142 µs
Wall time: 148 µs
XPATH method:
CPU times: user 110 µs, sys: 0 ns, total: 110 µs
Wall time: 114 µs
XPATH + f-string method:
CPU times: user 62 µs, sys: 0 ns, total: 62 µs
Wall time: 65.8 µshttps://stackoverflow.com/questions/71575383
复制相似问题