C# C使用异步加速并行webrequests

C# C使用异步加速并行webrequests,c#,multithreading,asynchronous,parallel-processing,httpwebrequest,C#,Multithreading,Asynchronous,Parallel Processing,Httpwebrequest,所以我有这个代码: 这是主函数,是一个并行for循环,它迭代所有需要发布的数据并调用函数 ParallelOptions pOpt = new ParallelOptions(); pOpt.MaxDegreeOfParallelism = 30; Parallel.For(0, maxsize, pOpt, (index,loopstate) => { //Calls the function where all the web

所以我有这个代码: 这是主函数,是一个并行for循环,它迭代所有需要发布的数据并调用函数

 ParallelOptions pOpt = new ParallelOptions();
    pOpt.MaxDegreeOfParallelism = 30;
    Parallel.For(0, maxsize, pOpt, (index,loopstate) => {

                    //Calls the function where all the webrequests are made
                    CallRequests(data1,data2);

                    if (isAborted)
                        loopstate.Stop();
                });
此函数在并行循环内调用

public static void CallRequests(string data1, string data2)     
    {
        var cookie = new CookieContainer();
        var postData =  Parameters[23] + data1 +
                        Parameters[24] + data2;

        HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(Parameters[25]);
        getRequest.Accept = Parameters[26];
        getRequest.KeepAlive = true;
        getRequest.Referer = Parameters[27];
        getRequest.CookieContainer = cookie;
        getRequest.UserAgent = Parameters[28];
        getRequest.Method = WebRequestMethods.Http.Post;
        getRequest.AllowWriteStreamBuffering = true;
        getRequest.ProtocolVersion = HttpVersion.Version10;
        getRequest.AllowAutoRedirect = false;
        getRequest.ContentType = Parameters[29];
        getRequest.ReadWriteTimeout = 5000;
        getRequest.Timeout = 5000;
        getRequest.Proxy = null;

        byte[] byteArray = Encoding.ASCII.GetBytes(postData);
        getRequest.ContentLength = byteArray.Length;
        Stream newStream = getRequest.GetRequestStream(); //open connection
        newStream.Write(byteArray, 0, byteArray.Length); // Send the data.
        newStream.Close();

        HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();

        if (getResponse.Headers["Location"] == Parameters[30])
        {
            //These are simple get requests to retrieve the source code using the same format as above.
            //I need to preserve the cookie
            GetRequets(data1, data2, Parameters[31], Parameters[13], cookie);
            GetRequets(data1, data2, Parameters[32], Parameters[15], cookie);
        }
    }
从我所看到的和被告知的情况来看,我理解使这些请求异步化比使用并行循环更好。我的方法对proccesor的要求也很高。我想知道如何使这些请求异步化,同时保留多线程方面。在post请求完成后,我还需要保留cookie。

将CallRequests方法转换为async实际上只是将sync方法调用切换为带有wait关键字的异步方法调用,并更改方法签名以返回任务

大概是这样的:

public static async Task CallRequestsAsync(string data1, string data2)
{
    var cookie = new CookieContainer();
    var postData = Parameters[23] + data1 +
                    Parameters[24] + data2;

    HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(Parameters[25]);
    getRequest.Accept = Parameters[26];
    getRequest.KeepAlive = true;
    getRequest.Referer = Parameters[27];
    getRequest.CookieContainer = cookie;
    getRequest.UserAgent = Parameters[28];
    getRequest.Method = WebRequestMethods.Http.Post;
    getRequest.AllowWriteStreamBuffering = true;
    getRequest.ProtocolVersion = HttpVersion.Version10;
    getRequest.AllowAutoRedirect = false;
    getRequest.ContentType = Parameters[29];
    getRequest.ReadWriteTimeout = 5000;
    getRequest.Timeout = 5000;
    getRequest.Proxy = null;

    byte[] byteArray = Encoding.ASCII.GetBytes(postData);
    getRequest.ContentLength = byteArray.Length;
    Stream newStream =await getRequest.GetRequestStreamAsync(); //open connection
    await newStream.WriteAsync(byteArray, 0, byteArray.Length); // Send the data.
    newStream.Close();

    HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();

    if (getResponse.Headers["Location"] == Parameters[30])
    {
        //These are simple get requests to retrieve the source code using the same format as above.
        //I need to preserve the cookie
        GetRequets(data1, data2, Parameters[31], Parameters[13], cookie);
        GetRequets(data1, data2, Parameters[32], Parameters[15], cookie);
    }
}
var tasks = Enumerable.Range(0, maxsize).Select(index => CallRequestsAsync(data1, data2));
Task.WaitAll(tasks.ToArray());
var actionBlock = new ActionBlock<int>(async index =>
{
    await CallRequestsAsync(data1, data2);
}, new ExecutionDataflowBlockOptions
{
    MaxDegreeOfParallelism = 30,
    BoundedCapacity = 100,
});
for (int i=0; i <= maxsize; i++)
{
    actionBlock.Post(i); // or await actionBlock.SendAsync(i) if calling method is also async
}

actionBlock.Complete();
actionBlock.Completion.Wait(); // or await actionBlock.Completion if calling method is also async
然而,这本身并不能真正让您取得任何进展,因为您仍然需要等待main方法中返回的任务。如果调用方法本身要变为异步,那么一种非常简单(尽管有些生硬)的方法就是简单地调用Task.WaitAll或wait Task.whell。大概是这样的:

public static async Task CallRequestsAsync(string data1, string data2)
{
    var cookie = new CookieContainer();
    var postData = Parameters[23] + data1 +
                    Parameters[24] + data2;

    HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(Parameters[25]);
    getRequest.Accept = Parameters[26];
    getRequest.KeepAlive = true;
    getRequest.Referer = Parameters[27];
    getRequest.CookieContainer = cookie;
    getRequest.UserAgent = Parameters[28];
    getRequest.Method = WebRequestMethods.Http.Post;
    getRequest.AllowWriteStreamBuffering = true;
    getRequest.ProtocolVersion = HttpVersion.Version10;
    getRequest.AllowAutoRedirect = false;
    getRequest.ContentType = Parameters[29];
    getRequest.ReadWriteTimeout = 5000;
    getRequest.Timeout = 5000;
    getRequest.Proxy = null;

    byte[] byteArray = Encoding.ASCII.GetBytes(postData);
    getRequest.ContentLength = byteArray.Length;
    Stream newStream =await getRequest.GetRequestStreamAsync(); //open connection
    await newStream.WriteAsync(byteArray, 0, byteArray.Length); // Send the data.
    newStream.Close();

    HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();

    if (getResponse.Headers["Location"] == Parameters[30])
    {
        //These are simple get requests to retrieve the source code using the same format as above.
        //I need to preserve the cookie
        GetRequets(data1, data2, Parameters[31], Parameters[13], cookie);
        GetRequets(data1, data2, Parameters[32], Parameters[15], cookie);
    }
}
var tasks = Enumerable.Range(0, maxsize).Select(index => CallRequestsAsync(data1, data2));
Task.WaitAll(tasks.ToArray());
var actionBlock = new ActionBlock<int>(async index =>
{
    await CallRequestsAsync(data1, data2);
}, new ExecutionDataflowBlockOptions
{
    MaxDegreeOfParallelism = 30,
    BoundedCapacity = 100,
});
for (int i=0; i <= maxsize; i++)
{
    actionBlock.Post(i); // or await actionBlock.SendAsync(i) if calling method is also async
}

actionBlock.Complete();
actionBlock.Completion.Wait(); // or await actionBlock.Completion if calling method is also async
然而,这真的很直截了当,并且失去了对并行运行的迭代次数的控制,等等。对于这类事情,我更喜欢使用。该库提供了一种将异步或同步操作并行链接并将它们从一个处理块传递到下一个处理块的方法。它有无数的选项来调整并行度、缓冲区大小等

详细的曝光超出了这个答案的可能范围,所以我鼓励你仔细阅读,但一种可能的方法是简单地将其推到一个动作块上——类似这样:

public static async Task CallRequestsAsync(string data1, string data2)
{
    var cookie = new CookieContainer();
    var postData = Parameters[23] + data1 +
                    Parameters[24] + data2;

    HttpWebRequest getRequest = (HttpWebRequest)WebRequest.Create(Parameters[25]);
    getRequest.Accept = Parameters[26];
    getRequest.KeepAlive = true;
    getRequest.Referer = Parameters[27];
    getRequest.CookieContainer = cookie;
    getRequest.UserAgent = Parameters[28];
    getRequest.Method = WebRequestMethods.Http.Post;
    getRequest.AllowWriteStreamBuffering = true;
    getRequest.ProtocolVersion = HttpVersion.Version10;
    getRequest.AllowAutoRedirect = false;
    getRequest.ContentType = Parameters[29];
    getRequest.ReadWriteTimeout = 5000;
    getRequest.Timeout = 5000;
    getRequest.Proxy = null;

    byte[] byteArray = Encoding.ASCII.GetBytes(postData);
    getRequest.ContentLength = byteArray.Length;
    Stream newStream =await getRequest.GetRequestStreamAsync(); //open connection
    await newStream.WriteAsync(byteArray, 0, byteArray.Length); // Send the data.
    newStream.Close();

    HttpWebResponse getResponse = (HttpWebResponse)getRequest.GetResponse();

    if (getResponse.Headers["Location"] == Parameters[30])
    {
        //These are simple get requests to retrieve the source code using the same format as above.
        //I need to preserve the cookie
        GetRequets(data1, data2, Parameters[31], Parameters[13], cookie);
        GetRequets(data1, data2, Parameters[32], Parameters[15], cookie);
    }
}
var tasks = Enumerable.Range(0, maxsize).Select(index => CallRequestsAsync(data1, data2));
Task.WaitAll(tasks.ToArray());
var actionBlock = new ActionBlock<int>(async index =>
{
    await CallRequestsAsync(data1, data2);
}, new ExecutionDataflowBlockOptions
{
    MaxDegreeOfParallelism = 30,
    BoundedCapacity = 100,
});
for (int i=0; i <= maxsize; i++)
{
    actionBlock.Post(i); // or await actionBlock.SendAsync(i) if calling method is also async
}

actionBlock.Complete();
actionBlock.Completion.Wait(); // or await actionBlock.Completion if calling method is also async
另外两点超出了我的回答范围,我应该顺便提一下:

看起来您的CallRequests方法正在使用其结果更新某些外部变量。在可能的情况下,最好避免这种模式,并让该方法返回结果,以便稍后由TPL数据流库通过TransformBlock处理。如果更新外部状态是不可避免的,那么请确保您已经考虑了多线程影响、死锁、竞争条件等,这些都超出了我的回答范围。 我假设索引有一些有用的属性,当你为你的问题创建一个最小的描述时,这些属性已经丢失了?它是否索引到参数列表或类似的内容中?如果是这样,您总是可以直接迭代这些操作并将ActionBlock更改为ActionBlock 确保您理解多线程/并行执行和异步执行之间的区别。确实有一些相似之处/重叠之处,但仅仅使某些东西异步并不能使其成为多线程的,反之亦然。
有帮助吗?如果使用var tasks=Enumerable.Range0,maxsize.Selectindex=>callrequestsasyncata1,data2;Task.WaitAlltasks.ToArray;我不太明白。Task.WaitAll确实有一个重载,该重载接受取消令牌,但我们可能有不同的目的。。。您并不总是希望发出所有请求?如果没有,那么您如何确定何时停止?如果该确定是基于对CallRequests的不同早期调用的结果做出的,那么您将如何并行地确定它们可以同时发生的位置??在我的第一个示例中,并行循环我可以使用变量IsOrted和call loopstate取消循环。Stop;在某些情况下,我不想发出所有请求,因此如何使用您提供的方法取消它们?对于Task.WaitAll,这将很困难,甚至可能是不可能的,因为至少在理论上,它们可以同时启动。我一点也不确定我是否理解您试图实现的执行路径。当您想要停止“未来”迭代时,什么逻辑控制?即使使用原始示例,30次迭代也可以同时处理,loopstate只会停止新的迭代,而不会停止任何正在进行的迭代。我在“未来”这个词周围加了引号,因为通过同时处理它们,你至少部分地放弃了对同时处理事情的执行顺序的控制。我决定放弃取消任务。实施它似乎不值得,也不安全。最后一个问题,在您给我的示例中,getResponse方法不应该是异步的吗?如果是,怎么做?