我有一个函数,基本上分为两个子函数。
html=RetriveHTML(int index);
returnColection = RegexProcess(html, index);通过优化RetrieveHTML并行化来加速此过程的最佳方法是什么?
通常,我使用最多20000个索引来调用它。第一个子功能依赖于网络(使用webclient.downloadstring从一个服务器上获取多个URL和HTML ),第二个子功能主要是中央处理器。
我迷失在并行的foreach和Tasks(continue with,continueall,fromasync)的世界中,我很难找到一个解决方案。我最初尝试并行foreach是为了它的简单性,但我发现它的性能,即网络I/O在连续调用时会降低(第一个循环很快,其他循环变得很慢)。该解决方案将在html对象被处理时释放它们,因为它们很多并且很大。我使用的是.net 4.0...
发布于 2013-01-23 08:29:13
您好,您可以尝试下面的代码
private Regex _regex= new Regex("net");
private void ProcessInParallell()
{
Uri[] resourceUri = new Uri[]{new Uri("http://www.microsoft.com"),new Uri( "http://www.google.com"),new Uri( "http://www.amazon.com") };
//1. Stage 1: Download HTML
//Use the blocking collection for concurrent tasks
BlockingCollection<string> htmlDataList = new BlockingCollection<string>();
Parallel.For(0, resourceUri.Length , index =>
{ var html = RetrieveHTML(resourceUri[index]);
htmlDataList.TryAdd(html);
//If we reach to the last index, signal the completion
if (index == (resourceUri.Length - 1))
{
htmlDataList.CompleteAdding();
}
});
//2. Get matches
//This concurrent bags will be used to store the result of the matching stage
ConcurrentBag<string> matchedHtml = new ConcurrentBag<string>();
IList<Task> processingTasks = new List<Task>();
//Enumerate through each downloaded HTML document
foreach (var html in htmlDataList.GetConsumingEnumerable())
{
//Create a new task to match the downloaded HTML
var task= Task.Factory.StartNew((data) =>
{
var downloadedHtml = data as string;
if(downloadedHtml ==null)
return;
if (_regex.IsMatch(downloadedHtml))
{
matchedHtml.Add(downloadedHtml);
}
},
html
);
//Add the task to the waiting list
processingTasks.Add(task);
}
//wait for the all tasks to complete
Task.WaitAll(processingTasks.ToArray());
foreach (var html in matchedHtml)
{
//Do something with the matched result
}
}
private string RetrieveHTML(Uri uri)
{
using (WebClient webClient = new WebClient())
{
//set this to null if there is no proxy
webClient.Proxy = null;
byte[] data =webClient.DownloadData(uri);
return Encoding.UTF8.GetString(data);
}
}https://stackoverflow.com/questions/14418809
复制相似问题