文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在不使用任何外部库的情况下解压100mb以上的大文件

问如何在不使用任何外部库的情况下解压100mb以上的大文件
EN

Stack Overflow用户

提问于 2020-11-03 15:54:35

回答 1查看 82关注 0票数 0

我尝试使用NuGet包来解压缩tgz文件，但是tgz包含的文件名包含不支持的字符，例如: 1111-11-1111:11:11.111.AA

已使用sharpcompress lib验证此问题。

因此，我必须遵循下面的要点链接

https://gist.github.com/ForeverZer0/a2cd292bd2f3b5e114956c00bb6e872b

这是我解压tgz文件的链接。这是一段非常好的代码，运行良好。但是当我尝试提取超过100MB的大尺寸tgz文件时，出现错误，就像流太长一样。

.net-core

gzip

gzipstream

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-11-03 17:38:33

该错误意味着您正在尝试向最大容量为int.MaxValue (大约2 2GB)的MemoryStream提供过多的字节。

如果您找不到合适的库，并且希望使用所提供的代码，则可以按如下方式修改它。

请注意，首先将整个GZipStream复制到MemoryStream。为什么？正如代码中的注释所述：

// A GZipStream is not seekable, so copy it first to a MemoryStream

然而，在后续代码中，只使用了两个要求流可查找的操作：stream.Seek(x, SeekOrigin.Current) (其中x始终为正)和stream.Position。这两个操作都可以通过读取流来模拟，而不需要进行查找。例如，要向前查找，您可以读取该字节数，然后丢弃：

private static void FakeSeekForward(Stream stream, int offset) {
    if (stream.CanSeek)
        stream.Seek(offset, SeekOrigin.Current);
    else {
        int bytesRead = 0;
        var buffer = new byte[offset];
        while (bytesRead < offset)
        {
            int read = stream.Read(buffer, bytesRead, offset - bytesRead);
            if (read == 0)
                throw new EndOfStreamException();
            bytesRead += read;
        }
    }
}

为了跟踪当前的流位置，您可以只存储读取的字节数。然后我们可以删除到MemoryStream的转换，链接中的代码就变成了：

public class Tar
{
    /// <summary>
    /// Extracts a <i>.tar.gz</i> archive to the specified directory.
    /// </summary>
    /// <param name="filename">The <i>.tar.gz</i> to decompress and extract.</param>
    /// <param name="outputDir">Output directory to write the files.</param>
    public static void ExtractTarGz(string filename, string outputDir)
    {
        using (var stream = File.OpenRead(filename))
            ExtractTarGz(stream, outputDir);
    }

    /// <summary>
    /// Extracts a <i>.tar.gz</i> archive stream to the specified directory.
    /// </summary>
    /// <param name="stream">The <i>.tar.gz</i> to decompress and extract.</param>
    /// <param name="outputDir">Output directory to write the files.</param>
    public static void ExtractTarGz(Stream stream, string outputDir)
    {
        using (var gzip = new GZipStream(stream, CompressionMode.Decompress))
        {
            // removed convertation to MemoryStream
            ExtractTar(gzip, outputDir);
        }
    }

    /// <summary>
    /// Extractes a <c>tar</c> archive to the specified directory.
    /// </summary>
    /// <param name="filename">The <i>.tar</i> to extract.</param>
    /// <param name="outputDir">Output directory to write the files.</param>
    public static void ExtractTar(string filename, string outputDir)
    {
        using (var stream = File.OpenRead(filename))
            ExtractTar(stream, outputDir);
    }

    /// <summary>
    /// Extractes a <c>tar</c> archive to the specified directory.
    /// </summary>
    /// <param name="stream">The <i>.tar</i> to extract.</param>
    /// <param name="outputDir">Output directory to write the files.</param>
    public static void ExtractTar(Stream stream, string outputDir) {
        var buffer = new byte[100];
        // store current position here
        long pos = 0;
        while (true) {
            pos += stream.Read(buffer, 0, 100);
            var name = Encoding.ASCII.GetString(buffer).Trim('\0');
            if (String.IsNullOrWhiteSpace(name))
                break;
            FakeSeekForward(stream, 24);
            pos += 24;
            
            pos += stream.Read(buffer, 0, 12);
            var size = Convert.ToInt64(Encoding.UTF8.GetString(buffer, 0, 12).Trim('\0').Trim(), 8);
            FakeSeekForward(stream, 376);
            pos += 376;

            var output = Path.Combine(outputDir, name);
            if (!Directory.Exists(Path.GetDirectoryName(output)))
                Directory.CreateDirectory(Path.GetDirectoryName(output));
            if (!name.Equals("./", StringComparison.InvariantCulture)) {
                using (var str = File.Open(output, FileMode.OpenOrCreate, FileAccess.Write)) {
                    var buf = new byte[size];
                    pos += stream.Read(buf, 0, buf.Length);
                    str.Write(buf, 0, buf.Length);
                }
            }

            var offset = (int) (512 - (pos % 512));
            if (offset == 512)
                offset = 0;
            FakeSeekForward(stream, offset);
            pos += offset;
        }
    }

    private static void FakeSeekForward(Stream stream, int offset) {
        if (stream.CanSeek)
            stream.Seek(offset, SeekOrigin.Current);
        else {
            int bytesRead = 0;
            var buffer = new byte[offset];
            while (bytesRead < offset)
            {
                int read = stream.Read(buffer, bytesRead, offset - bytesRead);
                if (read == 0)
                    throw new EndOfStreamException();
                bytesRead += read;
            }
        }
    }
}

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64658753

复制

相似问题

问如何在不使用任何外部库的情况下解压100mb以上的大文件
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在不使用任何外部库的情况下解压100mb以上的大文件EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在不使用任何外部库的情况下解压100mb以上的大文件
EN