文章/答案/技术大牛

发布

问UTF-16LE编码和xerces2 Java
EN

Stack Overflow用户

提问于 2019-09-10 19:52:05

回答 1查看 165关注 0票数 0

我看过一些帖子，比如FileReader reads the file as a character stream和can be treated as whitespace if the document is handed as a stream of characters，答案是输入源实际上是字符流，而不是字节流。

然而，来自1的建议解决方案似乎不适用于UTF-16LE。尽管我使用了下面的代码：

    try (final InputStream is = Files.newInputStream(filename.toPath(), StandardOpenOption.READ)) {
      DOMParser parser = new org.apache.xerces.parsers.DOMParser();
      parser.parse(new InputSource(is));
      return parser.getDocument();
    } catch (final SAXParseException saxEx) {
      LOG.debug("Unable to open [{}}] as InputSource.", absolutePath, saxEx);
    }

我还是有org.xml.sax.SAXParseException: Content is not allowed in prolog.的。

我看过Files.newInputStream，它确实使用了一个ChannelInputStream，它将传递字节，而不是字符。我还尝试设置InputSource对象的编码，但没有成功。我还检查了<?xml部件之前没有额外的字符(物料清单除外)。

我还想提一下，这段代码在UTF-8上运行得很好。

//编辑:我也尝试了DocumentBuilderFactory.newInstance().newDocumentBuilder().parse()和XmlInputStreamReader.next()，同样的结果。

//编辑2:尝试使用缓冲读取器。相同的结果:prolog中出现意外字符'뿯‘(代码49135 / 0xbfef)；应为'<’

提前谢谢。

java

xml

utf-16

xerces

byte-order-mark

回答 1

Stack Overflow用户

发布于 2019-09-10 21:21:56

为了更深入地收集一些信息：

byte[] bytes = Files.readAllBytes(filename.toPath);
String xml = new String(bytes, StandardCharsets.UTF_16LE);
if (xml.startsWith("\uFEFF")) {
    LOG.info("Has BOM and is evidently UTF_16LE");
    xml = xml.substring(1);
}
if (!xml.contains("<?xml")) {
    LOG.info("Has no XML declaration");
}
String declaredEncoding = xml.replaceFirst("<?xml[^>]*encoding=[\"']([^\"']+)[\"']", "$1");
if (declaredEncoding == xml) {
    declaredEncoding = "UTF-8";
}
LOG.info("Declared as " + declaredEncoding);

try (final InputStream is = new ByteArrayInputStream(xml.getBytes(declaredEncoding))) {
  DOMParser parser = new org.apache.xerces.parsers.DOMParser();
  parser.parse(new InputSource(is));
  return parser.getDocument();
} catch (final SAXParseException saxEx) {
  LOG.debug("Unable to open [{}}] as InputSource.", absolutePath, saxEx);
}

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/57870212

复制

相似问题

问UTF-16LE编码和xerces2 Java
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问UTF-16LE编码和xerces2 JavaEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问UTF-16LE编码和xerces2 Java
EN