目前,我正在开发一个分析单元,它不需要堆栈跟踪来保存调用堆栈,并且可以以html格式返回性能。
尽管日志本身是相当快的(每个函数的开销可忽略不计),但信息的可视化却不是这样。
在开发环境中生成的对象略多于25k,添加并行循环大约需要38秒钟。与以前一样,组成结果数据需要>5分钟。在一个测试环境中,它可以增长到这个大小的20倍。
在此之前,数据将存储一个小时。为了减少集体内存和性能标记,这将减少到20分钟(1200秒)。
生成HTML的代码本身相当简单,函数DataToLines似乎是罪魁祸首。
Public void ProfilingResult()
{
/* several lines of html generation */ <0.0001ms
ProfileData data = ApplicationLogging.GetProfilerData(); // < 0.012ms
// Root is a predefined ProfileData, interval contains the ints 120 and 1200. amount of seconds of data to visualize.
string[][][] dataLines = DataToLines(data , 0, "root", intervals); // bottleneck
/* Generate html based on these lines * // 0.031ms
}看一下这个函数,我发现它相当慢,因为它需要3到5秒。但真正的罪魁祸首显然是getProfilingStats。大概需要32秒。
private string[][][] DataToLines(ProfileData data, int indent, string parentGuid, int[] intervals)
{
List<string[][]> lines = new List<string[][]>();
Parallel.ForEach(profiledata.Children, (child) =>
{
string guid = GUID.NewGuid().ToString();
// The real, real culprit
ProfileStats[] stats = child.Value.getProfilingStats(intervals);
for(int i = 0; i < stats.Length; i++)
{
line[i][0] = parentGuid;
/* another 13 values being assigned in a similar way. */
}
lines.Add(line);
// Here we recurse to maintain the call stack order
if(child.Value.Children.Count > 0)
lines.AddRange(DataToLines(child.Value, indentation + 1, guid, intervals);
});
return lines.ToArray();
}getProfilingStats遍历特定函数调用线程的所有日志,并获得最短、最高和平均的执行时间。这意味着,这将迭代一个仅仅25k的对象。
public ProfileStats[] getProfilingStats(int[] intervals)
{
ProfileStats[] stats = new ProfileStats[intervals.Length];
for (int i = 0; i < intervals.Length; i++)
stats[i] = new ProfileStats();
int now = (Int32)(DateTime.UtcNow.Subtract(new DateTime(1970, 1, 1))).TotalSeconds;
if (queue.Count == 0)
return stats;
long[] totalDurations = new long[intervals.Length];
Parallel.ForEach(queue, (profilingRequest) =>
{
for (int i = 0; i < intervals.Length; i++)
{
if (now - profilingRequest.timestamp < intervals[i])
{
if (stats[i].queueCount == 0)
stats[i].minDuration = stats[i].maxDuration = profilingRequest.duration;
else if (profilingRequest.duration < stats[i].minDuration)
stats[i].minDuration = profilingRequest.duration;
else if (profilingRequest.duration > stats[i].maxDuration)
stats[i].maxDuration = profilingRequest.duration;
stats[i].queueCount++;
totalDurations[i] += profilingRequest.duration;
if (profilingRequest.containsExceptions)
stats[i].exceptionsCount++;
}
}
});
for (int i = 0; i < intervals.Length; i++)
{
if (stats[i].queueCount > 0)
{
stats[i].averageDuration = totalDurations[i] / stats[i].queueCount;
}
}
return stats;
}ProfileData如下所示
public class ProfileData
{
internal struct ProfilingRequest
{
public int timestamp;
public long duration;
public bool containsExceptions;
public ProfilingRequest(int timestamp, long duration, bool containsExceptions)
{
this.timestamp = timestamp;
this.duration = duration;
this.containsExceptions = containsExceptions;
}
}
public string name { get; private set; }
public int cleanInterval { get; private set; }
private ConcurrentQueue<ProfilingRequest> queue = new ConcurrentQueue<ProfilingRequest>();
public Dictionary<string, ProfileData> Children { get; private set; }
public ProfileData parent { get; private set; }
public ProfileData(string name, int cleanInterval)
{
Children = new Dictionary<string, ProfileData>();
parent = null;
this.name = name;
this.cleanInterval = cleanInterval;
}
public ProfileData GetChild(string groupName)
{
lock (Children)
{
if (!Children.ContainsKey(groupName))
{
ProfileData child = new ProfileData(groupName, cleanInterval);
child.parent = this;
Children.Add(groupName, child);
}
return Children[groupName];
}
}
public void Clean(int timestamp)
{
ProfilingRequest pr;
while (queue.Count > 0)
{
if (queue.TryPeek(out pr))
{
if (timestamp - pr.timestamp > cleanInterval)
queue.TryDequeue(out pr);
else
break;
}
}
}发布于 2016-04-21 22:47:07
在DataToLines中,我们有这样的
List<string[][]> lines = new List<string[][]>();
Parallel.ForEach(profiledata.Children, (child) =>
{
...
lines.Add(line);
...
});List<T>.Add不一定是线程安全的,因此在Parallel.ForEach中调用它是有问题的。
要演示这个问题,请尝试运行以下代码
var count = 10000;
var lines = new List<int>();
Parallel.ForEach(Enumerable.Range(0, count), lines.Add);
Debug.Assert(count == lines.Count, $"Expected {count}, actual {lines.Count}");我在运行这个程序时得到的输出是Fail: Expected 10000, actual 9823。
发布于 2016-04-21 11:44:42
在对Parallel.ForEach做了很多事情之后,我还是想建议像这样改变它
Parallel.ForEach(queue, (profilingRequest) =>
{
var currentTime = now - profilingRequest.timestamp;
for (int i = 0; i < intervals.Length; i++)
{
if (currentTime < intervals[i])
{
var currentStat = stats[i];
if (currentStat.queueCount == 0)
{
currentStat.minDuration = currentStat.maxDuration = profilingRequest.duration;
}
else if (profilingRequest.duration < currentStat.minDuration)
{
currentStat.minDuration = profilingRequest.duration;
}
else if (profilingRequest.duration > currentStat.maxDuration)
{
currentStat.maxDuration = profilingRequest.duration;
}
currentStat.queueCount++;
totalDurations[i] += profilingRequest.duration;
if (profilingRequest.containsExceptions)
{
currentStat.exceptionsCount++;
}
}
}
}); 通过计算循环外的currentTime,将profilingRequest的访问限制为一次,而不是对循环的每一次迭代进行计算。
通过使用var currentStat = stats[i]; (如果编译器没有在内部执行此操作),stats项就不需要通过其索引进行访问。
我想知道,如果profilingRequest.duration == currentStat.maxDuration似乎是一个被遗忘的边缘情况,应该发生什么。
如您所见,我添加了大括号,尽管它们可能是可选的。它们不会减慢执行速度,但是代码的可读性更好,错误也更少。
另一个需要改进的地方是
public ProfileData GetChild(string groupName)
{
lock (Children)
{
if (!Children.ContainsKey(groupName))
{
ProfileData child = new ProfileData(groupName, cleanInterval);
child.parent = this;
Children.Add(groupName, child);
}
return Children[groupName];
}
} 如果您需要从字典中获得一个值,并且不确定它是否在字典中,则应该使用TryGetValue(),因为它更快。也见我在这里的回答:C#中的消息总线
那家伙说,改变了以前的方法
public ProfileData GetChild(string groupName)
{
lock (Children)
{
ProfileData child;
if(Children.TryGetValue(groupName, out child))
{
return child;
}
child = new ProfileData(groupName, cleanInterval);
child.parent = this;
Children.Add(groupName, child);
return child;
}
} 应该能提高速度。
https://codereview.stackexchange.com/questions/126308
复制相似问题