文章/答案/技术大牛

发布

社区首页 >问答首页 >InputStreamReader缓冲问题

问InputStreamReader缓冲问题
EN

Stack Overflow用户

提问于 2010-04-14 00:53:48

回答 6查看 2.3K关注 0票数 11

不幸的是，我正在从一个具有两种字符编码类型的文件中读取数据。

有一个标题和一个正文。标头始终采用ASCII格式，并定义编码正文的字符集。

报头不是固定长度的，必须通过解析器来确定其内容/长度。

文件也可能非常大，所以我需要避免将整个内容放入内存中。

所以我一开始只有一个InputStream。我首先用带有ASCII码的InputStreamReader对其进行包装，然后解码头部并提取正文的字符集。一切都很好。

然后，我使用正确的字符集创建一个新的InputStreamReader，将其放在相同的InputStream上，并开始尝试读取正文。

不幸的是，javadoc证实了这一点，出于效率的目的，InputStreamReader可能会选择预读。因此，头部的读取会咬掉部分/全部正文。

有没有人有任何建议来解决这个问题？手动创建CharsetDecoder并一次输入一个字节会不会是一个好主意(可能包装在自定义Reader实现中？)

提前谢谢。

编辑:我的最后一个解决方案是编写一个没有缓冲的InputStreamReader，以确保我可以解析头部，而不会咬掉正文的一部分。虽然效率不是很高，但我用BufferedInputStream包装了原始的InputStream，所以这不是问题。

// An InputStreamReader that only consumes as many bytes as is necessary
// It does not do any read-ahead.
public class InputStreamReaderUnbuffered extends Reader
{
    private final CharsetDecoder charsetDecoder;
    private final InputStream inputStream;
    private final ByteBuffer byteBuffer = ByteBuffer.allocate( 1 );

    public InputStreamReaderUnbuffered( InputStream inputStream, Charset charset )
    {
        this.inputStream = inputStream;
        charsetDecoder = charset.newDecoder();
    }

    @Override
    public int read() throws IOException
    {
        boolean middleOfReading = false;

        while ( true )
        {
            int b = inputStream.read();

            if ( b == -1 )
            {
                if ( middleOfReading )
                    throw new IOException( "Unexpected end of stream, byte truncated" );

                return -1;
            }

            byteBuffer.clear();
            byteBuffer.put( (byte)b );
            byteBuffer.flip();

            CharBuffer charBuffer = charsetDecoder.decode( byteBuffer );

            // although this is theoretically possible this would violate the unbuffered nature
            // of this class so we throw an exception
            if ( charBuffer.length() > 1 )
                throw new IOException( "Decoded multiple characters from one byte!" );

            if ( charBuffer.length() == 1 )
                return charBuffer.get();

            middleOfReading = true;
        }
    }

    public int read( char[] cbuf, int off, int len ) throws IOException
    {
        for ( int i = 0; i < len; i++ )
        {
            int ch = read();

            if ( ch == -1 )
                return i == 0 ? -1 : i;

            cbuf[ i ] = (char)ch;
        }

        return len;
    }

    public void close() throws IOException
    {
        inputStream.close();
    }
}

character-encoding

decode

inputstreamreader

java

buffer

回答 6

Stack Overflow用户

回答已采纳

发布于 2010-04-14 01:02:56

为什么不使用2个InputStream？一个用于读取头，另一个用于正文。

第二个InputStream应该skip报头字节。

票数 3

Stack Overflow用户

发布于 2010-04-14 01:06:32

下面是伪代码。

ByteArrayOutputStream.

Create

使用InputStream，但不用Reader对其进行包装。

读取包含标头的字节并将其从ASCII中存储到ascii ByteArrayInputStream中，然后解码标头，这次使用ascii charset.

Compute非ByteArrayOutputStream输入的长度将ByteArrayInputStream包装到D12中，然后将该字节数从第二个D19中读取到另一个ascii D18中，并使用标头中的字符集将其包装在D20中。H221

票数 3

Stack Overflow用户

发布于 2010-04-14 01:03:21

我的第一个想法是关闭流并重新打开它，在将流提供给新的InputStreamReader之前，使用InputStream#skip跳过报头。

如果你真的，真的不想重新打开这个文件，你可以使用file descriptors来获取多个流到文件中，尽管你可能不得不使用channels在文件中有多个位置(因为你不能假设你可以用reset重置位置，它可能不受支持)。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/2631507

复制

相似问题

问InputStreamReader缓冲问题
EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问InputStreamReader缓冲问题EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问InputStreamReader缓冲问题
EN