大家好,我想用java阅读一个包含德文字符的网页的内容,不幸的是,德文字符显示为奇怪的字符。任何帮助,请看下面是我的代码:
String link = "some german link";
URL url = new URL(link);
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
System.out.println(inputLine);
}发布于 2011-05-31 22:22:32
您必须设置正确的编码。您可以在HTTP报头中找到编码:
Content-Type: text/html; charset=ISO-8859-1这可能会在(X)HTML文档中被覆盖,请参阅HTML Character encodings
我可以想象,你必须考虑许多不同的附加问题,才能无错误地解析网页。但是有不同的HTTP客户端库可用于Java,例如org.apache.httpcomponents。代码将如下所示:
DefaultHttpClient httpclient = new DefaultHttpClient();
HttpGet httpGet = new HttpGet("http://www.spiegel.de");
try
{
HttpResponse response = httpclient.execute(httpGet);
HttpEntity entity = response.getEntity();
if (entity != null)
{
System.out.println(EntityUtils.toString(entity));
}
}
catch (ClientProtocolException e) {e.printStackTrace();}
catch (IOException e) {e.printStackTrace();}这是maven工件:
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.1.1</version>
<type>jar</type>
<scope>compile</scope>
</dependency>发布于 2011-05-31 22:16:43
您需要指定InputStreamReader的字符集,如下所示
InputStreamReader(url.openStream(), "UTF-8") 发布于 2011-05-31 22:17:03
尝试设置字符集。
new BufferedReader(new InputStreamReader(url.openStream(), Charset.forName("UTF-8") ));https://stackoverflow.com/questions/6188901
复制相似问题