首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在Python中编写ElementTree时,如何保留ASCII十六进制代码点?

在Python中编写ElementTree时,如何保留ASCII十六进制代码点?
EN

Stack Overflow用户
提问于 2017-10-22 01:43:25
回答 1查看 343关注 0票数 1

我已经通过ElementTree解析器将一个xml文件(Rhythmbox的数据库文件)加载到Python3中。在使用ascii编码修改树并将其写入磁盘(ElementTree.write())之后,十六进制码点中的所有ASCII十六进制字符都被转换为ASCII十进制码点。例如,下面是一个包含版权符号的diff:

代码语言:javascript
复制
<     <copyright>&#xA9; WNYC</copyright>
---
>     <copyright>&#169; WNYC</copyright>

有没有办法告诉Python/ElementTree不要这么做?我希望所有十六进制代码都留在十六进制码位中。

EN

回答 1

Stack Overflow用户

发布于 2017-10-22 12:31:51

我找到了一个解决方案。首先,我创建了一个新的编解码器错误处理程序,然后对ElementTree._get_writer()进行了修补,以使用新的错误处理程序。看起来像这样:

代码语言:javascript
复制
from xml.etree import ElementTree
import io
import contextlib
import codecs


def lower_first(s):
    return s[:1].lower() + s[1:] if s else ''


def html_replace(exc):
    if isinstance(exc, (UnicodeEncodeError, UnicodeTranslateError)):
        s = []
        for c in exc.object[exc.start:exc.end]:
            s.append('&#%s;' % lower_first(hex(ord(c))[1:].upper()))
        return ''.join(s), exc.end
    else:
        raise TypeError("can't handle %s" % exc.__name__)

codecs.register_error('html_replace', html_replace)


# monkey patch this python function to prevent it from using xmlcharrefreplace
@contextlib.contextmanager
def _get_writer(file_or_filename, encoding):
    # returns text write method and release all resources after using
    try:
        write = file_or_filename.write
    except AttributeError:
        # file_or_filename is a file name
        if encoding == "unicode":
            file = open(file_or_filename, "w")
        else:
            file = open(file_or_filename, "w", encoding=encoding,
                        errors="html_replace")
        with file:
            yield file.write
    else:
        # file_or_filename is a file-like object
        # encoding determines if it is a text or binary writer
        if encoding == "unicode":
            # use a text writer as is
            yield write
        else:
            # wrap a binary writer with TextIOWrapper
            with contextlib.ExitStack() as stack:
                if isinstance(file_or_filename, io.BufferedIOBase):
                    file = file_or_filename
                elif isinstance(file_or_filename, io.RawIOBase):
                    file = io.BufferedWriter(file_or_filename)
                    # Keep the original file open when the BufferedWriter is
                    # destroyed
                    stack.callback(file.detach)
                else:
                    # This is to handle passed objects that aren't in the
                    # IOBase hierarchy, but just have a write method
                    file = io.BufferedIOBase()
                    file.writable = lambda: True
                    file.write = write
                    try:
                        # TextIOWrapper uses this methods to determine
                        # if BOM (for UTF-16, etc) should be added
                        file.seekable = file_or_filename.seekable
                        file.tell = file_or_filename.tell
                    except AttributeError:
                        pass
                file = io.TextIOWrapper(file,
                                        encoding=encoding,
                                        errors='html_replace',
                                        newline="\n")
                # Keep the original file open when the TextIOWrapper is
                # destroyed
                stack.callback(file.detach)
                yield file.write

ElementTree._get_writer = _get_writer
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/46866183

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档