Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/csharp/320.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/security/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
我有多少种方法不正确地进行c#并发任务管理?_C#_.net_Multithreading_Concurrency_Task Parallel Library - Fatal编程技术网

我有多少种方法不正确地进行c#并发任务管理?

我有多少种方法不正确地进行c#并发任务管理?,c#,.net,multithreading,concurrency,task-parallel-library,C#,.net,Multithreading,Concurrency,Task Parallel Library,我正在编写一个应用程序,它可以在网站上爬行,并在每个页面上进行各种测试。我想有一个选项,让爬虫运行几个并发的网站爬网一次。我有一个半工作的…解决方案。这将爬网的网站确定与1个任务,当我把它发送到最大并发任务2它运行得更快,如预期的。然而,当我将其设置为3或更高时,它似乎较慢。我有PHP的背景,所以我相当肯定我是以最糟糕的方式做这件事的 var DontLockGuiTask = Task.Run(() => { while (true) { if (toke

我正在编写一个应用程序,它可以在网站上爬行,并在每个页面上进行各种测试。我想有一个选项,让爬虫运行几个并发的网站爬网一次。我有一个半工作的…解决方案。这将爬网的网站确定与1个任务,当我把它发送到最大并发任务2它运行得更快,如预期的。然而,当我将其设置为3或更高时,它似乎较慢。我有PHP的背景,所以我相当肯定我是以最糟糕的方式做这件事的

var DontLockGuiTask = Task.Run(() =>
{
    while (true)
    {
        if (tokenSource2.IsCancellationRequested)
        {
            Logger.AddToActivityLog("Tasks stopped by user");
            break;
        }

        if (URLsToCheck.Count == 0 && CurrentNumberOfScrapes == 0)
        {
            EndOfCheck = true;
            break;
        }

        lock ("CurrentNumberOfScrapes")
        {
            CurrentNumberOfScrapes++;
        }

        var ScrapeTask = Task.Run(() =>
        {
            if (EndOfCheck)
            {
                CurrentNumberOfScrapes--;
                return;
            }

            URLCheckResultObject CheckResultForURL;

            Checker Checker                 = new Checker();
            URLsToCheckObject URLToCheck    = new URLsToCheckObject();

            lock ("URLsToCheck")
            {
                if (URLsToCheck.Count == 0)
                {
                    lock ("CurrentNumberOfScrapes")
                    {
                        CurrentNumberOfScrapes--;
                        return;
                    }
                }

                URLToCheck = URLsToCheck.First();
                URLsToCheck.Remove(URLToCheck);
            }

            CheckResultForURL = Checker.PerformCheckOnURL(URLToCheck, this);

            PagesCrawledCounter++;
            ChecksPerformedCounter += CheckResultForURL.Checkcounter;

            CheckResultForURL.URLID = PagesCrawledCounter;

            Logger.AddToActivityLog("Checking " + URLToCheck.URLAddress + "....");

            if(CheckResultForURL.NewListOFURLSToCheck != null)
            {
                foreach (LinkObject NewURLToAdd in CheckResultForURL.NewListOFURLSToCheck)
                {
                    lock ("URLsToCheck")
                    {
                        string CleanURL = NewURLToAdd.destinationURL;

                        if (CleanURL.EndsWith("/"))
                        {
                            CleanURL = CleanURL.Substring(0, CleanURL.Length - 1);
                        }

                        if (URlsWeKnownAbout.Contains(CleanURL)) continue;

                        URlsWeKnownAbout.Add(CleanURL);
                        URLsToCheck.Add(new URLsToCheckObject { URLAddress = CleanURL, Host = host });
                    }
                }
            }

            CheckResultForURL.NewListOFURLSToCheck = null;

            if(CheckResultForURL.SocialCheckResult != null)
            {
                ProblemID++;
                CheckResultForURL.SocialCheckResult.URLID = ProblemID;
                InsertSQLProblemIntoDataGrid(CheckResultForURL.SocialCheckResult);
            }

            lock ("CurrentNumberOfScrapes")
            {
                CurrentNumberOfScrapes--;
            }
        });

        while (CurrentNumberOfScrapes >= CurrentNumberScrapesMax)
        {
            if (tokenSource2.IsCancellationRequested == true)
            {
                Logger.AddToActivityLog("Tasks stopped by user");
                break;
            }

            Thread.Sleep(100);
        }
    }

    EnableUsedGUIForRun();

}, tokenSource2.Token); 
如您所见,我有几个while循环,检查当前有多少任务正在运行,如果不需要更多任务,则会休眠,当旧任务完成或当前任务量处于
currentNumbersCrapsMax
级别时,会启动新任务


我应该如何处理这个问题?我想管理多个并发任务,所有任务都访问相同的变量。

如果不向
URLsToCheck
添加新任务,则代码可能会简化为:

Parallel.ForEach(URLsToCheck,
    new ParallelOptions { MaxDegreeOfParallelism = Emvironment.ProcessorCount }, url => CrawlAcross(url));
但是如果你确实有更多的URL需要抓取,那么你需要更复杂的逻辑。您可以在此处尝试使用以下管道:

Buffer with urls --> Crawl the url processor --> Result saving
var buffer = new BufferBlock<string>();
var processor = new TransformBlock<string, CrawlResult>(url => 
{
    var result = CrawlAcross(url);
    foreach (var additionalUrl in result.AdditionalUrlsToParse)
    {
        buffer.Post(additionalUrl);
    }
    return result;
});
var handler = new ActionBlock<CrawlResult>(r => HandleResult(r));

buffer.LinkTo(processor, new DataflowLinkOptions() { PropagateCompletion = true });
processor.LinkTo(handler, new DataflowLinkOptions() { PropagateCompletion = true });

foreach (var url in URLsToCheck)
{
    buffer.Post(url);
}
其中第二部分可以将额外的url发布回缓冲区。所以它可以是这样的:

Buffer with urls --> Crawl the url processor --> Result saving
var buffer = new BufferBlock<string>();
var processor = new TransformBlock<string, CrawlResult>(url => 
{
    var result = CrawlAcross(url);
    foreach (var additionalUrl in result.AdditionalUrlsToParse)
    {
        buffer.Post(additionalUrl);
    }
    return result;
});
var handler = new ActionBlock<CrawlResult>(r => HandleResult(r));

buffer.LinkTo(processor, new DataflowLinkOptions() { PropagateCompletion = true });
processor.LinkTo(handler, new DataflowLinkOptions() { PropagateCompletion = true });

foreach (var url in URLsToCheck)
{
    buffer.Post(url);
}
var buffer=new BufferBlock();
变量处理器=新转换块(url=>
{
var结果=爬网(url);
foreach(result.AdditionalUrlsToParse中的var additionalUrl)
{
buffer.Post(附加URL);
}
返回结果;
});
var handler=newactionblock(r=>HandleResult(r));
LinkTo(处理器,新的DataflowLinkOptions(){PropagateCompletion=true});
LinkTo(handler,newdataflowlinkoptions(){PropagateCompletion=true});
foreach(URLsToCheck中的变量url)
{
buffer.Post(url);
}
关于代码的旁注:

  • while
    任务中的循环应标记为
    LongRunning
  • 子任务应该移出
  • lock
    语句不应在字符串常量上执行,而应在专用的
    static
    对象上执行,以提高可读性和预测结果
  • if(tokenSource2.IsCancellationRequested==true)
    可以简化为
    if(tokenSource2.IsCancellationRequested)
  • 如果您获得了
    tokenSource2.IsCancellationRequested
    标志,您应该调用
    ThrowifcCancellationRequested
  • 您应该处理token,而不是tokenSource
  • 也许是别的,很难说

大约7.lock接受一个对象(字符串也是一个),并确保该对象的引用被其他线程“锁定”。会有等待。现在锁定的是字符串的引用,而不是数字本身。改为使用lock()。您考虑过并行foreach选项吗?aka Parallel.ForEach(值,新的ParallelOptions{MaxDegreeOfParallelism=number}…)谢谢,我将检查Parallel.ForEach,看起来很有希望