文章/答案/技术大牛

发布

社区首页 >问答首页 >迭代数千个对象以生成html

问迭代数千个对象以生成html
EN

Code Review用户

提问于 2016-04-21 10:14:41

回答 2查看 147关注 0票数 2

目前，我正在开发一个分析单元，它不需要堆栈跟踪来保存调用堆栈，并且可以以html格式返回性能。

尽管日志本身是相当快的(每个函数的开销可忽略不计)，但信息的可视化却不是这样。

在开发环境中生成的对象略多于25k，添加并行循环大约需要38秒钟。与以前一样，组成结果数据需要>5分钟。在一个测试环境中，它可以增长到这个大小的20倍。

在此之前，数据将存储一个小时。为了减少集体内存和性能标记，这将减少到20分钟(1200秒)。

生成HTML的代码本身相当简单，函数DataToLines似乎是罪魁祸首。

Public void ProfilingResult()
{
  /* several lines of html generation */ <0.0001ms
  ProfileData data = ApplicationLogging.GetProfilerData(); // < 0.012ms
  // Root is a predefined ProfileData, interval contains the ints 120 and 1200. amount of seconds of data to visualize. 
  string[][][] dataLines = DataToLines(data , 0, "root", intervals); // bottleneck
  /* Generate html based on these lines * // 0.031ms
}

看一下这个函数，我发现它相当慢，因为它需要3到5秒。但真正的罪魁祸首显然是getProfilingStats。大概需要32秒。

private string[][][] DataToLines(ProfileData data, int indent, string parentGuid, int[] intervals)
{
    List<string[][]> lines = new List<string[][]>();
    Parallel.ForEach(profiledata.Children, (child) =>
    {
        string guid = GUID.NewGuid().ToString();
        // The real, real culprit
        ProfileStats[] stats = child.Value.getProfilingStats(intervals);
        for(int i = 0; i < stats.Length; i++)
        {
          line[i][0] = parentGuid;
          /* another 13 values being assigned in a similar way.  */
        }
        lines.Add(line);
        // Here we recurse to maintain the call stack order
        if(child.Value.Children.Count > 0)
            lines.AddRange(DataToLines(child.Value, indentation + 1, guid, intervals);
    });
  return lines.ToArray();
}

getProfilingStats遍历特定函数调用线程的所有日志，并获得最短、最高和平均的执行时间。这意味着，这将迭代一个仅仅25k的对象。

    public ProfileStats[] getProfilingStats(int[] intervals)
    {
        ProfileStats[] stats = new ProfileStats[intervals.Length];

        for (int i = 0; i < intervals.Length; i++)
            stats[i] = new ProfileStats();

        int now = (Int32)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;

        if (queue.Count == 0)
            return stats;

        long[] totalDurations = new long[intervals.Length];

        Parallel.ForEach(queue, (profilingRequest) =>
        {
            for (int i = 0; i < intervals.Length; i++)
            {
                if (now - profilingRequest.timestamp < intervals[i])
                {
                    if (stats[i].queueCount == 0)
                        stats[i].minDuration = stats[i].maxDuration = profilingRequest.duration;
                    else if (profilingRequest.duration < stats[i].minDuration)
                            stats[i].minDuration = profilingRequest.duration;
                    else if (profilingRequest.duration > stats[i].maxDuration)
                        stats[i].maxDuration = profilingRequest.duration;

                    stats[i].queueCount++;
                    totalDurations[i] += profilingRequest.duration;
                    if (profilingRequest.containsExceptions)
                        stats[i].exceptionsCount++;
                }
            }
        });

        for (int i = 0; i < intervals.Length; i++)
        {
            if (stats[i].queueCount > 0)
            {
                stats[i].averageDuration = totalDurations[i] / stats[i].queueCount;
            }
        }
        return stats;
    }

ProfileData如下所示

public class ProfileData
{
    internal struct ProfilingRequest
    {
        public int timestamp;
        public long duration;
        public bool containsExceptions;

        public ProfilingRequest(int timestamp, long duration, bool containsExceptions)
        {
            this.timestamp = timestamp;
            this.duration = duration;
            this.containsExceptions = containsExceptions;
        }
    }

    public string name { get; private set; }
    public int cleanInterval { get; private set; }
    private ConcurrentQueue<ProfilingRequest> queue = new ConcurrentQueue<ProfilingRequest>();

    public Dictionary<string, ProfileData> Children { get; private set; }
    public ProfileData parent { get; private set; }

    public ProfileData(string name, int cleanInterval)
    {
        Children = new Dictionary<string, ProfileData>();
        parent = null;
        this.name = name;
        this.cleanInterval = cleanInterval;
    }

    public ProfileData GetChild(string groupName)
    {
        lock (Children)
        {
            if (!Children.ContainsKey(groupName))
            {
                ProfileData child = new ProfileData(groupName, cleanInterval);
                child.parent = this;
                Children.Add(groupName, child);
            }
            return Children[groupName];
        }
    }

    public void Clean(int timestamp)
    {
        ProfilingRequest pr;
        while (queue.Count > 0)
        {
            if (queue.TryPeek(out pr))
            {
                if (timestamp - pr.timestamp > cleanInterval)
                    queue.TryDequeue(out pr);
                else
                    break;
            }
        }
    }

performance

回答 2

Code Review用户

回答已采纳

发布于 2016-04-21 22:47:07

在DataToLines中，我们有这样的

List<string[][]> lines = new List<string[][]>();
Parallel.ForEach(profiledata.Children, (child) =>
{
    ...
    lines.Add(line);
    ...
});

List<T>.Add不一定是线程安全的，因此在Parallel.ForEach中调用它是有问题的。

要演示这个问题，请尝试运行以下代码

var count = 10000;
var lines = new List<int>();
Parallel.ForEach(Enumerable.Range(0, count), lines.Add);
Debug.Assert(count == lines.Count, $"Expected {count}, actual {lines.Count}");

我在运行这个程序时得到的输出是Fail: Expected 10000, actual 9823。

票数 3

Code Review用户

发布于 2016-04-21 11:44:42

在对Parallel.ForEach做了很多事情之后，我还是想建议像这样改变它

    Parallel.ForEach(queue, (profilingRequest) =>
    {

        var currentTime = now - profilingRequest.timestamp;

        for (int i = 0; i < intervals.Length; i++)
        {

            if (currentTime < intervals[i])
            {

                var currentStat = stats[i];

                if (currentStat.queueCount == 0)
                {
                    currentStat.minDuration = currentStat.maxDuration = profilingRequest.duration;
                }
                else if (profilingRequest.duration < currentStat.minDuration)
                {
                        currentStat.minDuration = profilingRequest.duration;
                }
                else if (profilingRequest.duration > currentStat.maxDuration)
                {
                    currentStat.maxDuration = profilingRequest.duration;
                }
                currentStat.queueCount++;
                totalDurations[i] += profilingRequest.duration;

                if (profilingRequest.containsExceptions)
                {
                    currentStat.exceptionsCount++;
                }
            }
        }
    });

通过计算循环外的currentTime，将profilingRequest的访问限制为一次，而不是对循环的每一次迭代进行计算。

通过使用var currentStat = stats[i]; (如果编译器没有在内部执行此操作)，stats项就不需要通过其索引进行访问。

我想知道，如果profilingRequest.duration == currentStat.maxDuration似乎是一个被遗忘的边缘情况，应该发生什么。

如您所见，我添加了大括号，尽管它们可能是可选的。它们不会减慢执行速度，但是代码的可读性更好，错误也更少。

另一个需要改进的地方是

public ProfileData GetChild(string groupName)
{
    lock (Children)
    {
        if (!Children.ContainsKey(groupName))
        {
            ProfileData child = new ProfileData(groupName, cleanInterval);
            child.parent = this;
            Children.Add(groupName, child);
        }
        return Children[groupName];
    }
}

如果您需要从字典中获得一个值，并且不确定它是否在字典中，则应该使用TryGetValue()，因为它更快。也见我在这里的回答：C#中的消息总线

那家伙说，改变了以前的方法

public ProfileData GetChild(string groupName)
{
    lock (Children)
    {
        ProfileData child;
        if(Children.TryGetValue(groupName, out child))
        {
            return child;
        }

        child = new ProfileData(groupName, cleanInterval);
        child.parent = this;
        Children.Add(groupName, child);

        return child;
    }
}

应该能提高速度。

票数 1

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/126308

复制

相似问题

问迭代数千个对象以生成html
EN

回答 2

Code Review用户

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问迭代数千个对象以生成htmlEN

回答 2

Code Review用户

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问迭代数千个对象以生成html
EN