文章/答案/技术大牛

发布

社区首页 >问答首页 >在Python中编写ElementTree时，如何保留ASCII十六进制代码点？

问在Python中编写ElementTree时，如何保留ASCII十六进制代码点？
EN

Stack Overflow用户

提问于 2017-10-22 01:43:25

回答 1查看 343关注 0票数 1

我已经通过ElementTree解析器将一个xml文件(Rhythmbox的数据库文件)加载到Python3中。在使用ascii编码修改树并将其写入磁盘(ElementTree.write())之后，十六进制码点中的所有ASCII十六进制字符都被转换为ASCII十进制码点。例如，下面是一个包含版权符号的diff：

<     <copyright>&#xA9; WNYC</copyright>
---
>     <copyright>&#169; WNYC</copyright>

有没有办法告诉Python/ElementTree不要这么做？我希望所有十六进制代码都留在十六进制码位中。

python

xml

encoding

ascii

回答 1

Stack Overflow用户

发布于 2017-10-22 12:31:51

我找到了一个解决方案。首先，我创建了一个新的编解码器错误处理程序，然后对ElementTree._get_writer()进行了修补，以使用新的错误处理程序。看起来像这样：

from xml.etree import ElementTree
import io
import contextlib
import codecs


def lower_first(s):
    return s[:1].lower() + s[1:] if s else ''


def html_replace(exc):
    if isinstance(exc, (UnicodeEncodeError, UnicodeTranslateError)):
        s = []
        for c in exc.object[exc.start:exc.end]:
            s.append('&#%s;' % lower_first(hex(ord(c))[1:].upper()))
        return ''.join(s), exc.end
    else:
        raise TypeError("can't handle %s" % exc.__name__)

codecs.register_error('html_replace', html_replace)


# monkey patch this python function to prevent it from using xmlcharrefreplace
@contextlib.contextmanager
def _get_writer(file_or_filename, encoding):
    # returns text write method and release all resources after using
    try:
        write = file_or_filename.write
    except AttributeError:
        # file_or_filename is a file name
        if encoding == "unicode":
            file = open(file_or_filename, "w")
        else:
            file = open(file_or_filename, "w", encoding=encoding,
                        errors="html_replace")
        with file:
            yield file.write
    else:
        # file_or_filename is a file-like object
        # encoding determines if it is a text or binary writer
        if encoding == "unicode":
            # use a text writer as is
            yield write
        else:
            # wrap a binary writer with TextIOWrapper
            with contextlib.ExitStack() as stack:
                if isinstance(file_or_filename, io.BufferedIOBase):
                    file = file_or_filename
                elif isinstance(file_or_filename, io.RawIOBase):
                    file = io.BufferedWriter(file_or_filename)
                    # Keep the original file open when the BufferedWriter is
                    # destroyed
                    stack.callback(file.detach)
                else:
                    # This is to handle passed objects that aren't in the
                    # IOBase hierarchy, but just have a write method
                    file = io.BufferedIOBase()
                    file.writable = lambda: True
                    file.write = write
                    try:
                        # TextIOWrapper uses this methods to determine
                        # if BOM (for UTF-16, etc) should be added
                        file.seekable = file_or_filename.seekable
                        file.tell = file_or_filename.tell
                    except AttributeError:
                        pass
                file = io.TextIOWrapper(file,
                                        encoding=encoding,
                                        errors='html_replace',
                                        newline="\n")
                # Keep the original file open when the TextIOWrapper is
                # destroyed
                stack.callback(file.detach)
                yield file.write

ElementTree._get_writer = _get_writer

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/46866183

复制

相似问题

问在Python中编写ElementTree时，如何保留ASCII十六进制代码点？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python中编写ElementTree时，如何保留ASCII十六进制代码点？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在Python中编写ElementTree时，如何保留ASCII十六进制代码点？
EN