Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/fsharp/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/video/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Asynchronous 异步挂起_Asynchronous_F# - Fatal编程技术网

Asynchronous 异步挂起

Asynchronous 异步挂起,asynchronous,f#,Asynchronous,F#,我有一些相当简单的F#异步代码,可以从维基百科上随机下载100篇文章(用于研究) 由于某些原因,代码在下载过程中会在任意时间点挂起。有时是50岁以后,有时是80岁以后 异步代码本身相当简单: let parseWikiAsync(url:string, count:int ref) = async { use wc = new WebClientWithTimeout(Timeout = 5000) let! html = wc.Async

我有一些相当简单的F#异步代码,可以从维基百科上随机下载100篇文章(用于研究)

由于某些原因,代码在下载过程中会在任意时间点挂起。有时是50岁以后,有时是80岁以后

异步代码本身相当简单:

let parseWikiAsync(url:string, count:int ref) =
    async {
            use wc = new WebClientWithTimeout(Timeout = 5000)
            let! html = wc.AsyncDownloadString(Uri(url))
            let ret =
                try html |> parseDoc |> parseArticle
                with | ex -> printfn "%A" ex; None
            lock count (fun () ->
                if !count % 10 = 0 then
                    printfn "%d" !count
                count := !count + 1
            )
            return ret
    }
因为我无法通过fsi找出问题所在,所以我制作了WebClientWithTimeout,这是一个允许我指定超时的
System.Net.WebClient
包装器:

type WebClientWithTimeout() =
    inherit WebClient()
    member val Timeout = 60000 with get, set

    override x.GetWebRequest uri =
        let r = base.GetWebRequest(uri)
        r.Timeout <- x.Timeout
        r
当我编译代码并在调试器中运行它时,只有三个线程,其中只有一个线程正在运行实际代码——异步管道。另外两个在位置上“不可用”,调用堆栈中没有任何内容

我认为这意味着它不会被困在
AsyncDownloadString
或parseWikiAsync中的任何地方。还有什么原因可能导致这种情况

哦,还有,在异步代码实际启动之前,最初大约需要一分钟的时间。之后,它以相当合理的速度运行,直到它再次无限期地挂起

以下是主线程的调用堆栈:

>   mscorlib.dll!System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle waitableSafeHandle, long millisecondsTimeout, bool hasThreadAffinity, bool exitContext) + 0x22 bytes 
    mscorlib.dll!System.Threading.WaitHandle.WaitOne(int millisecondsTimeout, bool exitContext) + 0x28 bytes    
    FSharp.Core.dll!Microsoft.FSharp.Control.AsyncImpl.ResultCell<Microsoft.FSharp.Control.AsyncBuilderImpl.Result<Microsoft.FSharp.Core.FSharpOption<Program.ArticleData>[]>>.TryWaitForResultSynchronously(Microsoft.FSharp.Core.FSharpOption<int> timeout) + 0x36 bytes  
    FSharp.Core.dll!Microsoft.FSharp.Control.CancellationTokenOps.RunSynchronously<Microsoft.FSharp.Core.FSharpOption<Program.ArticleData>[]>(System.Threading.CancellationToken token, Microsoft.FSharp.Control.FSharpAsync<Microsoft.FSharp.Core.FSharpOption<Program.ArticleData>[]> computation, Microsoft.FSharp.Core.FSharpOption<int> timeout) + 0x1ba bytes 
    FSharp.Core.dll!Microsoft.FSharp.Control.FSharpAsync.RunSynchronously<Microsoft.FSharp.Core.FSharpOption<Program.ArticleData>[]>(Microsoft.FSharp.Control.FSharpAsync<Microsoft.FSharp.Core.FSharpOption<Program.ArticleData>[]> computation, Microsoft.FSharp.Core.FSharpOption<int> timeout, Microsoft.FSharp.Core.FSharpOption<System.Threading.CancellationToken> cancellationToken) + 0xb9 bytes   
    WikiSurvey.exe!<StartupCode$WikiSurvey>.$Program.main@() Line 97 + 0x55 bytes   F#
>mscorlib.dll!System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle WaitableSaffeHandle,长毫秒估计值,bool hasThreadAffinity,bool exitContext)+0x22字节
mscorlib.dll!System.Threading.WaitHandle.WaitOne(int毫秒计时,bool exitContext)+0x28字节
FSharp.Core.dll!Microsoft.FSharp.Control.AsyncImpl.ResultCell.TryWaitForResults同步(Microsoft.FSharp.Core.FSharpOption超时)+0x36字节
FSharp.Core.dll!Microsoft.FSharp.Control.CancellationTokenOps.RunSynchronously(System.Threading.CancellationToken令牌、Microsoft.FSharp.Control.FSharpAsync计算、Microsoft.FSharp.Core.FSharpOption超时)+0x1ba字节
FSharp.Core.dll!Microsoft.FSharp.Control.FSharpAsync.RunSynchronously(Microsoft.FSharp.Control.FSharpAsync计算,Microsoft.FSharp.Core.FSharpOption超时,Microsoft.FSharp.Core.FSharpOption取消令牌)+0xb9字节
WikiSurvey.exe$Program.main@()行97+0x55字节F#

您的代码似乎没有做任何特别的事情,所以我假设维基百科不喜欢您的活动。看看他们的照片。再深入一点,他们似乎也有一个严格的目标

截至2010年2月15日,Wikimedia网站需要HTTP用户代理 所有请求的标题。这是政府作出的一项实际决定 技术人员,并在技术会议上进行了宣布和讨论 邮件列表。[1][2]基本原理是,不发送邮件的客户端 用户代理字符串大多是行为不良的脚本,会导致大量错误 在服务器上加载,而不会使项目受益。注意 用户代理字符串的非描述性默认值,例如已使用 通过Perl的libwww,也可能被阻止使用Wikimedia网站 (或部分网站,如api.php)

不发送用户代理头的用户代理(浏览器或脚本) 现在可能会遇到如下错误消息:

Outstanding requests: 4
Outstanding requests: 2
Outstanding requests: 1
Outstanding requests: 3
Outstanding requests: 5
Outstanding requests: 6
Outstanding requests: 7
Outstanding requests: 8
Outstanding requests: 9
Outstanding requests: 10
Outstanding requests: 12
Outstanding requests: 14
Outstanding requests: 15
Outstanding requests: 16
Outstanding requests: 17
Outstanding requests: 18
Outstanding requests: 13
Outstanding requests: 19
Outstanding requests: 20
Outstanding requests: 24
Outstanding requests: 22
Outstanding requests: 26
Outstanding requests: 27
Outstanding requests: 28
Outstanding requests: 29
Outstanding requests: 30
Outstanding requests: 25
Outstanding requests: 21
Outstanding requests: 23
Outstanding requests: 11
Outstanding requests: 29
Outstanding requests: 28
Outstanding requests: 27
Outstanding requests: 26
Outstanding requests: 25
Outstanding requests: 24
Outstanding requests: 23
Outstanding requests: 22
Outstanding requests: 21
Outstanding requests: 20
Outstanding requests: 19
Outstanding requests: 18
Outstanding requests: 17
Outstanding requests: 16
Outstanding requests: 15
Outstanding requests: 14
Outstanding requests: 13
Outstanding requests: 12
Outstanding requests: 11
Outstanding requests: 10
Outstanding requests: 9
Outstanding requests: 8
Outstanding requests: 7
Outstanding requests: 6
Outstanding requests: 5
Outstanding requests: 4
Outstanding requests: 3
Outstanding requests: 2
Outstanding requests: 1
Outstanding requests: 0
Finished running all of the requests.
脚本应使用带有联系信息的信息用户代理字符串,否则可能会在不通知的情况下被IP阻止

因此,尽管我发现他们可能不喜欢你正在做的事情,即使你添加了一个合适的用户代理,但你不妨尝试一下

wc.Headers.Add ("User-Agent", "Friendly Bot 1.0 (FriendlyBot@friendlybot.com)")

避免与他们的服务器建立如此多的连接也不会有什么坏处。

维基百科不是罪魁祸首,这是
Async.Parallel
内部工作方式的结果。
Async.Parallel
的类型签名是
seq
。它返回一个异步值,该值包含序列中的所有结果——因此在
seq
值中的所有计算都将从Uri中检索内容、解析内容并返回结果之前,它不会返回

/// Given a Uri, creates an infinite sequence of whose elements are retrieved
/// from the Uri.
let createDocumentSeq (uri : System.Uri) =
    #if DEBUG
    let outstandingRequestCount = ref 0
    #endif

    Seq.initInfinite <| fun _ ->
        async {
        use wc = new WebClientWithTimeout(Timeout = 5000)
        wc.Headers.Add ("User-Agent", "Friendly Bot 1.0 (FriendlyBot@friendlybot.com)")

        #if DEBUG
        // Increment the outstanding request count just before we send the request.
        do
            // NOTE : The message must be created THEN passed to synchedOut.WriteLine --
            // piping it (|>) into synchedOut.WriteLine or using fprintfn causes a closure
            // to be created which somehow defeats the synchronization and garbles the output.
            let msg =
                Interlocked.Increment outstandingRequestCount
                |> sprintf "Outstanding requests: %i"
            synchedOut.WriteLine msg
        #endif

        let! html = wc.AsyncDownloadString uri
        let ret =
            try Some html
            with ex ->
                let msg = sprintf "%A" ex
                synchedOut.WriteLine msg
                None

        #if DEBUG
        // Decrement the outstanding request count now that we've
        // received a reponse and parsed it.
        do
            let msg =
                Interlocked.Decrement outstandingRequestCount
                |> sprintf "Outstanding requests: %i"
            synchedOut.WriteLine msg
        #endif

        return ret
        }
如果您运行该代码,您将看到它从服务器一次提取一个结果,而不会留下未完成的请求。此外,更改要检索的结果的数量也很容易——您只需更改传递给
Seq.take
的值即可

现在,虽然流式代码工作得很好,但它不会并行执行请求,因此对于大量文档来说可能会很慢。这是一个很容易解决的问题,尽管解决方案可能有点不直观。与其尝试并行执行整个请求序列(这是原始代码中的问题),不如创建一个函数,使用
Async.parallel
并行执行小批请求,然后使用
Seq.collect
将结果合并回一个平面序列

/// Given a sequence of Async<'T>, creates a new sequence whose elements
/// are computed in batches of a specified size.
let parallelBatch batchSize (sequence : seq<Async<'T>>) =
    sequence
    |> Seq.windowed batchSize
    |> Seq.collect (fun batch ->
        batch
        |> Async.Parallel
        |> Async.RunSynchronously)
同样,更改要检索的文档数量很容易,批量大小也很容易修改(同样,我建议您将其保持在合理的小范围内)。如果愿意,可以对“流”和“批处理”代码进行一些调整,以便在运行时在它们之间切换


最后一件事——在我的代码中,请求不应该超时,因此您可以摆脱
WebClientWithTimeout
类,直接使用
WebClient

是的,我实际上克隆了Chrome用户代理,并将其插入其中以使其正常工作。我只是不想在这个问题上分享它,因为我意识到它并不完全符合犹太教。不过,我不认为这是bot策略,因为AsyncDownloadString调用似乎不在代码停止的地方,而且挂起的时间似乎是任意的。嗯…也许wiki返回了一个我看不到的错误(尽管尝试/使用和异常每次返回代码>没有< /代码>),但它看起来是从BOT策略看它不影响像我的代码,在爬虫的定义下它可能更适合。@ Reimiyasak-什么使我认为这是一个事实
/// Given a Uri, creates an infinite sequence of whose elements are retrieved
/// from the Uri.
let createDocumentSeq (uri : System.Uri) =
    #if DEBUG
    let outstandingRequestCount = ref 0
    #endif

    Seq.initInfinite <| fun _ ->
        async {
        use wc = new WebClientWithTimeout(Timeout = 5000)
        wc.Headers.Add ("User-Agent", "Friendly Bot 1.0 (FriendlyBot@friendlybot.com)")

        #if DEBUG
        // Increment the outstanding request count just before we send the request.
        do
            // NOTE : The message must be created THEN passed to synchedOut.WriteLine --
            // piping it (|>) into synchedOut.WriteLine or using fprintfn causes a closure
            // to be created which somehow defeats the synchronization and garbles the output.
            let msg =
                Interlocked.Increment outstandingRequestCount
                |> sprintf "Outstanding requests: %i"
            synchedOut.WriteLine msg
        #endif

        let! html = wc.AsyncDownloadString uri
        let ret =
            try Some html
            with ex ->
                let msg = sprintf "%A" ex
                synchedOut.WriteLine msg
                None

        #if DEBUG
        // Decrement the outstanding request count now that we've
        // received a reponse and parsed it.
        do
            let msg =
                Interlocked.Decrement outstandingRequestCount
                |> sprintf "Outstanding requests: %i"
            synchedOut.WriteLine msg
        #endif

        return ret
        }
//
let en100_Streaming =
    #if DEBUG
    let documentCount = ref 0
    #endif

    Uri ("http://en.wikipedia.org/wiki/Special:Random")
    |> createDocumentSeq
    |> Seq.choose (fun asyncDoc ->
        Async.RunSynchronously asyncDoc
        |> Option.bind (parseDoc >> parseArticle))
    #if DEBUG
    |> Seq.map (fun x ->
        let msg =
            Interlocked.Increment documentCount
            |> sprintf "Parsed documents: %i"
        synchedOut.WriteLine msg
        x)
    #endif
    |> Seq.take 50
    // None of the computations actually take place until
    // this point, because Seq.toArray forces evaluation of the sequence.
    |> Seq.toArray
/// Given a sequence of Async<'T>, creates a new sequence whose elements
/// are computed in batches of a specified size.
let parallelBatch batchSize (sequence : seq<Async<'T>>) =
    sequence
    |> Seq.windowed batchSize
    |> Seq.collect (fun batch ->
        batch
        |> Async.Parallel
        |> Async.RunSynchronously)
let en100_Batched =
    let batchSize = 10
    #if DEBUG
    let documentCount = ref 0
    #endif

    Uri ("http://en.wikipedia.org/wiki/Special:Random")
    |> createDocumentSeq
    // Execute batches in parallel
    |> parallelBatch batchSize
    |> Seq.choose (Option.bind (parseDoc >> parseArticle))
    #if DEBUG
    |> Seq.map (fun x ->
        let msg =
            Interlocked.Increment documentCount
            |> sprintf "Parsed documents: %i"
        synchedOut.WriteLine msg
        x)
    #endif
    |> Seq.take 50
    // None of the computations actually take place until
    // this point, because Seq.toArray forces evaluation of the sequence.
    |> Seq.toArray