Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/csharp/266.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/wcf/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C# 使用异步CTP并发下载HTML页面_C#_.net_Multithreading_Async Ctp_Async Await - Fatal编程技术网

C# 使用异步CTP并发下载HTML页面

C# 使用异步CTP并发下载HTML页面,c#,.net,multithreading,async-ctp,async-await,C#,.net,Multithreading,Async Ctp,Async Await,尝试使用异步CTP编写HTML爬虫程序时,我遇到了如何编写无递归方法来完成这一任务的难题 这是我目前掌握的代码 private readonly ConcurrentStack<LinkItem> _LinkStack; private readonly Int32 _MaxStackSize; private readonly WebClient client = new WebClient(); Func<string, string, Task<List<L

尝试使用异步CTP编写HTML爬虫程序时,我遇到了如何编写无递归方法来完成这一任务的难题

这是我目前掌握的代码

private readonly ConcurrentStack<LinkItem> _LinkStack;
private readonly Int32 _MaxStackSize;
private readonly WebClient client = new WebClient();

Func<string, string, Task<List<LinkItem>>> DownloadFromLink = async (BaseURL, uri) => 
{
    string html = await client.DownloadStringTaskAsync(uri);
    return LinkFinder.Find(html, BaseURL);
};

Action<LinkItem> DownloadAndPush = async (o) => 
{
    List<LinkItem> result = await DownloadFromLink(o.BaseURL, o.Href);
    if (this._LinkStack.Count() + result.Count <= this._MaxStackSize)
    {
        this._LinkStack.PushRange(result.ToArray());
        o.Processed = true;
    }  
};

Parallel.ForEach(this._LinkStack, (o) => 
{
    DownloadAndPush(o);
});
private readonly ConcurrentStack\u LinkStack;
私有只读Int32_MaxStackSize;
private readonly WebClient客户端=新建WebClient();
Func DownloadFromLink=async(BaseURL,uri)=>
{
string html=await client.downloadstringtasksync(uri);
返回LinkFinder.Find(html,BaseURL);
};
操作DownloadAndPush=async(o)=>
{
列表结果=等待从链接下载(o.BaseURL,o.Href);
if(this._LinkStack.Count()+result.Count
{
下载和推送(o);
});
但显然这并不像我希望的那样有效,因为在
并行.ForEach
执行第一次(也是唯一一次迭代)时,我只有1项。我能想到的最简单的方法是使
ForEach
递归,但我不能(我不认为)这样做,因为我会很快耗尽堆栈空间


有谁能告诉我如何重新构造这段代码,创建一个我所说的递归续集,在达到
MaxStackSize
或系统内存不足之前添加项目?

我认为使用C#5/.Net 4.5做类似事情的最好方法是使用。甚至还有

基本上,您创建一个“块”,负责下载一个URL并从中获取链接:

var cts = new CancellationTokenSource();

Func<LinkItem, Task<IEnumerable<LinkItem>>> downloadFromLink =
    async link =>
            {
                // WebClient is not guaranteed to be thread-safe,
                // so we shouldn't use one shared instance
                var client = new WebClient();
                string html = await client.DownloadStringTaskAsync(link.Href);

                return LinkFinder.Find(html, link.BaseURL);
            };

var linkFinderBlock = new TransformManyBlock<LinkItem, LinkItem>(
    downloadFromLink,
    new ExecutionDataflowBlockOptions
    { MaxDegreeOfParallelism = 4, CancellationToken = cts.Token });
由于我们没有设置
MaxDegreeOfParallelism
,它默认为1。这意味着在这里使用非线程安全的集合应该可以

我们再创建一个块:它将从
linkFinderBlock
获取一个链接,并将其传递到
storeBlock
,然后返回到
linkFinderBlock

var broadcastBlock = new BroadcastBlock<LinkItem>(li => li);
然后,我们可以将第一个项目发送到
linkFinderBlock
(或者
broadcastBlock
,如果您还想将其发送到
storeBlock
),开始处理:

最后等待处理完成:

try
{
    linkFinderBlock.Completion.Wait();
}
catch (AggregateException ex)
{
    if (!(ex.InnerException is TaskCanceledException))
        throw;
}

哇!谢谢你的精彩解释。你能确认一件事吗?如果我们将MaxDegreeOfParallelism设置为一个大于1的数字,这是否意味着我需要将集合类型更改为类似ConcurrentStack的类型,以确保线程安全?你是指
storeBlock
中的集合?你在哪里设置
MaxDegreeOfParallelism
?如果将
storeBlock
MDOP
设置为>1,则是的,您需要在那里使用一些线程安全集合(或使用锁)但是如果你把某个其他块的<代码> MDOP < /代码>设置为> 1,它不会影响<代码>存储块< /代码>的并行性,所以你不需要考虑线程安全性。这将使我升级到2012!+ 1 + 1。控制递归的人控制宇宙!
linkFinderBlock.LinkTo(broadcastBlock);
broadcastBlock.LinkTo(storeBlock);
broadcastBlock.LinkTo(linkFinderBlock);
linkFinderBlock.Post(firstItem);
try
{
    linkFinderBlock.Completion.Wait();
}
catch (AggregateException ex)
{
    if (!(ex.InnerException is TaskCanceledException))
        throw;
}