文章/答案/技术大牛

发布

社区首页 >问答首页 >使用iterparse修改特定的xml标记

问使用iterparse修改特定的xml标记
EN

Stack Overflow用户

提问于 2020-02-02 02:16:13

回答 1查看 291关注 0票数 0

我正在使用开放的地图数据，需要能够更新基于它们的值的特定标签。我已经能够读取标签，甚至可以将需要更新的特定标签打印到控制台，但我无法让它们进行更新。

我使用的是elementree和lxml。我特别寻找的是，如果addr:street标签的第一个单词是基数方向(即北、南、东、西)，而addr:housenumber标签的最后一个单词不是基数方向，则从addr:street标签中提取第一个单词，并将其移动到addr:housenumber标签的最后一个单词。

基于以下问题进行编辑。

最初，我只是使用以下命令调用代码：

clean_data(OUTPUT_FILE)

我没有意识到iterparse不能用于直接从方法内部打印(我相信这就是您所说的)。我的代码来自我之前使用的项目的不同部分，所以我对你写的内容进行了修改，我之前的代码是这样的：

在前面的文件中：

import xml.etree.cElementTree as ET
from collections import defaultdict
import pprint
import re
import codecs
import json

OSM_FILE = "Utah County Map.osm"
OUTPUT_FILE = "Utah County Extract.osm"
JSON_FILE = "JSON MAP DATA.json"

项目的这一部分中的代码：

def clean_data(osm_file, tags = ('node', 'way')):
    context = iter(ET.iterparse(osm_file, events=('end',)))
    for event, elem in context:
        if elem.tag == 'node':
            streetTag, street = getVal(elem, 'addr:street')
            if street is None:  # No "street"
                continue
            first_word = getWord(street, True)
            houseTag, houseNo = getVal(elem, 'addr:housenumber')
            if houseNo is None:  # No "housenumber"
                continue
            last_word = getWord(houseNo, False)
            if first_word in direct_list and last_word not in direct_list:
                streetTag.attrib['v'] = street[len(first_word) + 1:]
                houseTag.attrib['v'] = houseNo + ' ' + first_word

for i, element in enumerate(clean_data(OUTPUT_FILE)):
    print(ET.tostring(context.root, encoding='unicode', pretty_print=True, with_tail=False))

当我现在运行这个的时候，我得到了一个错误：

TypeError: 'NoneType' object is not iterable

我尝试添加我之前在项目的另一部分中使用的输出代码，但收到了相同的错误。下面是供参考代码。(此代码中的输出文件指的是第一阶段数据清理的输出，其中我删除了多个无效节点)。

with open(CLEAN_DATA, 'w') as output:
    output.write('<?xml version="1.0" encoding="UTF-8"?>\n')
    output.write('<osm>\n  ')

    for i, element in enumerate(clean_data(OUTPUT_FILE)):
        output.write(ET.tostring(element, encoding='unicode'))

    output.write('</osm>')

最初的编辑是为了回答下面Valdi_bo的问题。下面是我的xml文件中的一个示例，以供参考。是的，我同时使用了elementree和lxml，因为lxml似乎是Elementree的子集。我之前在程序中调用的一些函数只能使用其中的一个，所以我同时使用这两个函数。

<?xml version="1.0" encoding="UTF-8"?>
<osm>
  <node changeset="24687880" id="356682074" lat="40.2799548" lon="-111.6457549" timestamp="2014-08-11T20:33:35Z" uid="2253787" user="1000hikes" version="2">
    <tag k="addr:city" v="Provo" />
    <tag k="addr:housenumber" v="3570" />
    <tag k="addr:postcode" v="84604" />
    <tag k="addr:street" v="Timpview Drive" />
    <tag k="building" v="school" />
    <tag k="ele" v="1463" />
    <tag k="gnis:county_id" v="049" />
    <tag k="gnis:created" v="02/25/1989" />
    <tag k="gnis:feature_id" v="1449106" />
    <tag k="gnis:state_id" v="49" />
    <tag k="name" v="Timpview High School" />
    <tag k="operator" v="Provo School District" />
  </node>
  <node changeset="58421729" id="356685655" lat="40.2414325" lon="-111.6678877" timestamp="2018-04-25T20:23:33Z" uid="360392" user="maxerickson" version="4">
    <tag k="addr:city" v="Provo" />
    <tag k="addr:housenumber" v="585" />
    <tag k="addr:postcode" v="84601" />
    <tag k="addr:street" v="North 500 West" />
    <tag k="amenity" v="doctors" />
    <tag k="gnis:feature_id" v="2432255" />
    <tag k="healthcare" v="doctor" />
    <tag k="healthcare:speciality" v="gynecology;obstetrics" />
    <tag k="name" v="Valley Obstetrics &amp; Gynecology" />
    <tag k="old_name" v="Healthsouth Provo Surgical Center" />
    <tag k="phone" v="+1 801 374 1801" />
    <tag k="website" v="http://valleyobgynutah.com/location/provo-office-2/" />
  </node>
</osm>

在本例中，第一个节点将保持不变。在第二个块中，addr: should _street标记应该从585更改为585 North，addr:street标记应该从North 500 West更改为500 West。

python-3.x

lxml

elementtree

large-files

回答 1

Stack Overflow用户

发布于 2020-02-02 05:51:06

尝试以下代码：

函数/全局变量：

def getVal(nd, kVal):
    '''
    Get data from "tag" child node with required "k" attribute
    Parameters:
      nd   - "starting" node,
      kVal - value of "k" attribute.
    Results:
      - the tag found,
      - its "v" attribute
    '''
    tg = nd.find(f'tag[@k="{kVal}"]')
    if tg is None:
        return (None, None)
    return (tg, tg.attrib.get('v'))

def getWord(txt, first):
    '''
    Get first / last word from "txt"
    '''
    pat = r'^\S+' if first else r'\S+$'
    mtch = re.search(pat, txt)
    return mtch.group() if mtch else ''

direct_list = ["N", "N." "No", "North", "S", "S.",
    "So", "South", "E", "E.", "East", "W", "W.", "West"]

和主代码：

for nd in tree.iter('node'):
    streetTag, street = getVal(nd, 'addr:street')
    if street is None:  # No "street"
        continue
    first_word = getWord(street, True)
    houseTag, houseNo = getVal(nd, 'addr:housenumber')
    if houseNo is None:  # No "housenumber"
        continue
    last_word = getWord(houseNo, False)
    if first_word in direct_list and last_word not in direct_list:
        streetTag.attrib['v'] = street[len(first_word) + 1:]
        houseTag.attrib['v'] = houseNo + ' ' + first_word

我假设tree变量保存了整个XML树。

根据22:36:33Z的评论进行编辑

我的代码也可以在基于iterparse的循环中工作。

准备一些根标签和几个节点元素的input.xml文件。然后尝试以下代码(使用上面提供的必要的导入、函数和全局变量)：

context = iter(etree.iterparse('input.xml', events=('end',)))
for event, elem in context:
    if elem.tag == 'node':
        streetTag, street = getVal(elem, 'addr:street')
        if street is None:  # No "street"
            continue
        first_word = getWord(street, True)
        houseTag, houseNo = getVal(elem, 'addr:housenumber')
        if houseNo is None:  # No "housenumber"
            continue
        last_word = getWord(houseNo, False)
        if first_word in direct_list and last_word not in direct_list:
            streetTag.attrib['v'] = street[len(first_word) + 1:]
            houseTag.attrib['v'] = houseNo + ' ' + first_word

由于iterparse只处理结束事件，因此您甚至不需要在第一个if中使用and event == 'end'。

您不需要代码中的初始_, root = next(context)，因为context.root指向整个XML树。

现在，有了构建的XML树，您可以打印它，以查看结果：

print(etree.tostring(context.root, encoding='unicode', pretty_print=True,
    with_tail=False))

备注：

上面的代码是写的，没有产生任何东西，但是它生成了一个完整的XML树，根据你的需要更新了。
由于任务是一棵XML树，所以这个代码不会清除任何东西。只有在以下情况下才需要调用clear：
- 从已处理的元素中检索一些数据并将其保存到其他地方，
- 不需要这些元素任何clear

现在，您可以将上面的代码重新构造为一个“产出”变体，并在您的环境中使用它(您没有提供任何有关如何调用您的代码样本的细节)。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/60019810

复制

相似问题

问使用iterparse修改特定的xml标记
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用iterparse修改特定的xml标记EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用iterparse修改特定的xml标记
EN