文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用FileSteam加快读取文件的速度

问如何使用FileSteam加快读取文件的速度
EN

Stack Overflow用户

提问于 2015-09-26 05:57:43

回答 1查看 113关注 0票数 0

我在搜索文件内容时遇到了性能问题。我使用FileStream类来读取文件(每次搜索将涉及大约10个文件，每个文件的大小约为70MB)。然而，在我的搜索过程中，所有这些文件都同时被另一个进程访问和更新。因此，我不能使用Buffersize读取文件。在StreamReader中使用buffer size需要3分钟，即使我使用的是正则表达式。

有没有人遇到过类似的情况，可以提供任何关于提高文件搜索性能的建议？

代码片段

  private static int BufferSize = 32768;
  using (FileStream fs = File.Open(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
        {

            using (TextReader txtReader = new StreamReader(fs, Encoding.UTF8, true, BufferSize))

            {
                System.Text.RegularExpressions.Regex patternMatching = new System.Text.RegularExpressions.Regex(@"(?=\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})(.*?)(?=\n\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})", System.Text.RegularExpressions.RegexOptions.IgnoreCase);
                System.Text.RegularExpressions.Regex dateStringMatch = new Regex(@"^\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}");
                char[] temp = new char[1048576];
                while (txtReader.ReadBlock(temp, 0, 1048576) > 0)
                {
                    StringBuilder parseString = new StringBuilder();
                    parseString.Append(temp);
                    if (temp[1023].ToString() != Environment.NewLine)
                    {
                        parseString.Append(txtReader.ReadLine());
                        while (txtReader.Peek() > 0 && !(txtReader.Peek() >= 48 && txtReader.Peek() <= 57))
                        {
                            parseString.Append(txtReader.ReadLine());
                        }
                    }
                    if (parseString.Length > 0)
                    {
                        string[] allRecords = patternMatching.Split(parseString.ToString());
                        foreach (var item in allRecords)
                        {

                            var contentString = item.Trim();
                            if (!string.IsNullOrWhiteSpace(contentString))
                            {
                                var matches = dateStringMatch.Matches(contentString);
                                if (matches.Count > 0)
                                {

                                    var rowDatetime = DateTime.MinValue;
                                    if (DateTime.TryParse(matches[0].Value, out rowDatetime))
                                    {
                                        if (rowDatetime >= startDate && rowDatetime < endDate)
                                        {
                                            if (contentString.ToLowerInvariant().Contains(searchText))
                                            {
                                                var result = new SearchResult
                                                {
                                                    LogFileType = logFileType,
                                                    Message = string.Format(messageTemplateNew, item),
                                                    Timestamp = rowDatetime,
                                                    ComponentName = componentName,
                                                    FileName = filePath,
                                                    ServerName = serverName
                                                };
                                                searchResults.Add(result);
                                            }
                                        }
                                    }

                                }

                            }
                        }
                    }
                }
            }
        }

        return searchResults;

filestream

performance

stream

回答 1

Stack Overflow用户

发布于 2015-09-26 10:48:14

前段时间，我不得不分析许多FileZilla服务器日志文件，每个文件都大于120MB。我使用一个简单的列表来获取每个日志文件的所有行，然后在搜索特定行时有很好的性能。

List<string> fileContent = File.ReadAllLines(pathToFile).ToList()

但在您的情况下，我认为性能不佳的主要原因不是读取文件。尝试StopWatch你的循环的一些部分，以检查在哪里花费的时间最多。如果在像您这样的循环中多次使用正则表达式和TryParse，那么它们会非常耗时。

票数 -1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/32791220

复制

相似问题

问如何使用FileSteam加快读取文件的速度
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用FileSteam加快读取文件的速度EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用FileSteam加快读取文件的速度
EN