我在特殊角色和charset = iso-8859-1上遇到了麻烦。我在这里使用的代码在UTF-8中运行得很好,所以我不明白我做错了什么。
以下是代码:
File input = new File("/users/marcioapf/example.html");
Document doc = Jsoup.parse(input, "iso-8859-1", "");
Elements elements = doc.select("span.DEPUTADO") ;
System.out.println(elements.toString());这是输出:
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Joãozinho Pereira</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Isnaldo Bulhões</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Antonio Albuquerque</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Jeferson Morais</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Inácio Loiola</span> 应该是这样的:
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Joãozinho Pereira</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Isnaldo Bulhões</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Antonio Albuquerque</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Jeferson Morais</span>
<span style="margin-left: 8px; width: auto !important;" class="DEPUTADO">Inácio Loiola</span>我怎样才能修好它?
发布于 2014-02-24 09:22:20
使用EscapeMode.xhtml将为您提供没有实体的输出。试试这段代码
File input = new File("/users/marcioapf/example.html");
Document doc = Jsoup.parse(input, "iso-8859-1", "");
doc.outputSettings().escapeMode(EscapeMode.xhtml);
Elements elements = doc.select("span.DEPUTADO") ;
System.out.println(elements.toString());https://stackoverflow.com/questions/21974758
复制相似问题