Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/jsf-2/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Multithreading 编译的控制台命令行程序不';Don’不要等到所有的线程都完成了_Multithreading_Asynchronous_F#_Mailboxprocessor - Fatal编程技术网

Multithreading 编译的控制台命令行程序不';Don’不要等到所有的线程都完成了

Multithreading 编译的控制台命令行程序不';Don’不要等到所有的线程都完成了,multithreading,asynchronous,f#,mailboxprocessor,Multithreading,Asynchronous,F#,Mailboxprocessor,如果代码被编译到控制台程序或作为fsi--use:program.fs--exec--quiet运行,则某些线程将在完成之前终止。有没有办法等待所有线程结束 此问题可以描述为“存在多个MailboxProcessor时的程序退出问题” 输出示例 (注意,最后一行被截断,最后一个输出函数(printfn“[Main]”在爬网“”之后)永远不会执行。) [主要]在爬网之前 返回结果之前的[爬网] http://news.google.com 代理1已爬网。 [主管]已达到限制 代理5完成了。 htt

如果代码被编译到控制台程序或作为fsi--use:program.fs--exec--quiet运行,则某些线程将在完成之前终止。有没有办法等待所有线程结束

此问题可以描述为“存在多个MailboxProcessor时的程序退出问题”

输出示例

(注意,最后一行被截断,最后一个输出函数(
printfn“[Main]”在爬网“
”之后)永远不会执行。)

[主要]在爬网之前 返回结果之前的[爬网] http://news.google.com 代理1已爬网。 [主管]已达到限制 代理5完成了。 http://www.gstatic.com/news/img/favicon.ico 代理1已爬网。 [主管]已达到限制 代理1已完成。 http://www.google.com/imghp?hl=en&tab=ni 由代理4爬网。 [主管]已达到限制 代理4完成了。 http://www.google.com/webhp?hl=en&tab=nw 已被代理2爬网。 [主管]已达到限制 代理2完成了。 http://news.google. 代码

编辑:添加了几个
System.Threading.Thread.CurrentThread.IsBackground List.filter(fun x->Regex(pattern2.IsMatch(x))
链接
//获取网页。
let fetch(url:string)=
尝试
让req=WebRequest.Create(url):?>HttpWebRequest
req.UserAgent无
具有
|无
让我们收集链接url=
让html=fetchurl
将html与
|一些x->提取链接x
|无->[]
开放式助手
//创建将打印同步到控制台(so)的邮箱
//打印时,对“printfn”的两个调用不会交错)
让打印机=
MailboxProcessor.Start(趣味x->async{
尽管如此
让!str=x.Receive()
System.Threading.Thread.CurrentThread.IsBackground异步{
System.Threading.Thread.CurrentThread.IsBackground repl
|->failwith“预期启动消息!”
让rec循环运行=
异步的{
让!msg=x.Receive()
配味精
|邮箱(邮箱)->
让count=set.count
如果计数如果不是(set.Contains str),则
设为set'=set.Add str
邮箱。使用“意外启动消息”发布失败消息
|Url->failwith“意外Url消息!”
|完成->打印fn“[主管]主管完成。”
(x:>IDisposable.Dispose())
//通知调用方代理已完成
答复.答复(())
}
do!循环true})
让urlCollector=
MailboxProcessor.Start(有趣的y->
让rec循环计数=
异步的{
System.Threading.Thread.CurrentThread.IsBackground
将消息与匹配
|Url u->
配上
|一些url->q.排队url
返回!循环计数
|无->返回!循环计数
| _ ->
匹配计数
|Gate->(y:>IDisposable).Dispose()
printfn“[urlCollector]URL收集器已完成。”
主管,完毕
|->返回!循环(计数+1)
|无->主管。后停止
返回!循环计数
}
循环1)
///初始化爬网代理。
让爬虫id=
MailboxProcessor.Start(有趣的收件箱->
让rec循环()=
异步的{
System.Threading.Thread.CurrentThread.IsBackground
将x与
|一些url->
let links=collectLinks url
printfn“%s”已由代理%d爬网。url id
对于链接中的链接怎么办
urlCollector.Post supervisor.Post(邮箱(收件箱))
return!loop()
|\uu->printfn“代理%d已完成。”id
urlCollector。完成后
(收件箱:>IDisposable.Dispose())
}
循环())
//向主代理发送“开始”消息。结果
//是异步工作流,将在
//代理爬网完成
让结果=主管。PostAndAsyncReply(开始)
//繁殖爬虫。
让爬虫=
[
因为我在1号门做什么
产量爬虫Ⅰ
]
//发布第一条消息。
crawlers.Head.Post List.iter(fun ag->ag.Post Async.RunSynchronously
printfn“[主]爬网后”
请注意,我知道零F#,但通常您会使用等待所有感兴趣的线程。在我看来,在您的情况下,您需要等待通过调用
.Start
启动的任何感兴趣的线程

<>你也可以考虑任务并行库,它给你一个更高的层次(更简单)抽象到原始托管线程。例如,等待任务完成。

请注意,我知道零F#,但通常您使用等待所有感兴趣的线程。在我看来,您需要等待通过调用
.Start
启动的感兴趣的任何内容

您还可以考虑任务并行库,它可以为原始托管线程提供更高级(更简单)的抽象。 [Main] before crawl [Crawl] before return result http://news.google.com crawled by agent 1. [supervisor] reached limit Agent 5 is done. http://www.gstatic.com/news/img/favicon.ico crawled by agent 1. [supervisor] reached limit Agent 1 is done. http://www.google.com/imghp?hl=en&tab=ni crawled by agent 4. [supervisor] reached limit Agent 4 is done. http://www.google.com/webhp?hl=en&tab=nw crawled by agent 2. [supervisor] reached limit Agent 2 is done. http://news.google.

open System
open System.Collections.Concurrent
open System.Collections.Generic
open System.IO
open System.Net
open System.Text.RegularExpressions

module Helpers =

    type Message =
        | Done
        | Mailbox of MailboxProcessor<Message>
        | Stop
        | Url of string option
        | Start of AsyncReplyChannel<unit>

    // Gates the number of crawling agents.
    [<Literal>]
    let Gate = 5

    // Extracts links from HTML.
    let extractLinks html =
        let pattern1 = "(?i)href\\s*=\\s*(\"|\')/?((?!#.*|/\B|" + 
                       "mailto:|location\.|javascript:)[^\"\']+)(\"|\')"
        let pattern2 = "(?i)^https?"

        let links =
            [
                for x in Regex(pattern1).Matches(html) do
                    yield x.Groups.[2].Value
            ]
            |> List.filter (fun x -> Regex(pattern2).IsMatch(x))
        links

    // Fetches a Web page.
    let fetch (url : string) =
        try
            let req = WebRequest.Create(url) :?> HttpWebRequest
            req.UserAgent <- "Mozilla/5.0 (Windows; U; MSIE 9.0; Windows NT 9.0; en-US)"
            req.Timeout <- 5000
            use resp = req.GetResponse()
            let content = resp.ContentType
            let isHtml = Regex("html").IsMatch(content)
            match isHtml with
            | true -> use stream = resp.GetResponseStream()
                      use reader = new StreamReader(stream)
                      let html = reader.ReadToEnd()
                      Some html
            | false -> None
        with
        | _ -> None

    let collectLinks url =
        let html = fetch url
        match html with
        | Some x -> extractLinks x
        | None -> []

open Helpers

// Creates a mailbox that synchronizes printing to the console (so 
// that two calls to 'printfn' do not interleave when printing)
let printer = 
    MailboxProcessor.Start(fun x -> async {
        while true do 
        let! str = x.Receive()
        System.Threading.Thread.CurrentThread.IsBackground <- false
        printfn "%s" str })
// Hides standard 'printfn' function (formats the string using 
// 'kprintf' and then posts the result to the printer agent.
let printfn fmt = 
    Printf.kprintf printer.Post fmt

let crawl url limit = 
    // Concurrent queue for saving collected urls.
    let q = ConcurrentQueue<string>()

    // Holds crawled URLs.
    let set = HashSet<string>()


    let supervisor =
        MailboxProcessor.Start(fun x -> async {
            System.Threading.Thread.CurrentThread.IsBackground <- false
            // The agent expects to receive 'Start' message first - the message
            // carries a reply channel that is used to notify the caller
            // when the agent completes crawling.
            let! start = x.Receive()
            let repl =
              match start with
              | Start repl -> repl
              | _ -> failwith "Expected Start message!"

            let rec loop run =
                async {
                    let! msg = x.Receive()
                    match msg with
                    | Mailbox(mailbox) -> 
                        let count = set.Count
                        if count < limit - 1 && run then 
                            let url = q.TryDequeue()
                            match url with
                            | true, str -> if not (set.Contains str) then
                                                let set'= set.Add str
                                                mailbox.Post <| Url(Some str)
                                                return! loop run
                                            else
                                                mailbox.Post <| Url None
                                                return! loop run

                            | _ -> mailbox.Post <| Url None
                                   return! loop run
                        else
                            printfn "[supervisor] reached limit" 
                            // Wait for finishing
                            mailbox.Post Stop
                            return! loop run
                    | Stop -> printfn "[Supervisor] stop"; return! loop false
                    | Start _ -> failwith "Unexpected start message!"
                    | Url _ -> failwith "Unexpected URL message!"
                    | Done -> printfn "[Supervisor] Supervisor is done."
                              (x :> IDisposable).Dispose()
                              // Notify the caller that the agent has completed
                              repl.Reply(())
                }
            do! loop true })


    let urlCollector =
        MailboxProcessor.Start(fun y ->
            let rec loop count =
                async {
                    System.Threading.Thread.CurrentThread.IsBackground <- false
                    let! msg = y.TryReceive(6000)
                    match msg with
                    | Some message ->
                        match message with
                        | Url u ->
                            match u with
                            | Some url -> q.Enqueue url
                                          return! loop count
                            | None -> return! loop count
                        | _ ->
                            match count with
                            | Gate -> (y :> IDisposable).Dispose()
                                      printfn "[urlCollector] URL collector is done."
                                      supervisor.Post Done
                            | _ -> return! loop (count + 1)
                    | None -> supervisor.Post Stop
                              return! loop count
                }
            loop 1)

    /// Initializes a crawling agent.
    let crawler id =
        MailboxProcessor.Start(fun inbox ->
            let rec loop() =
                async {
                    System.Threading.Thread.CurrentThread.IsBackground <- false
                    let! msg = inbox.Receive()
                    match msg with
                    | Url x ->
                        match x with
                        | Some url -> 
                                let links = collectLinks url
                                printfn "%s crawled by agent %d." url id
                                for link in links do
                                    urlCollector.Post <| Url (Some link)
                                supervisor.Post(Mailbox(inbox))
                                return! loop()
                        | None -> supervisor.Post(Mailbox(inbox))
                                  return! loop()
                    | _ -> printfn "Agent %d is done." id
                           urlCollector.Post Done
                           (inbox :> IDisposable).Dispose()
                    }
            loop())

    // Send 'Start' message to the main agent. The result
    // is asynchronous workflow that will complete when the
    // agent crawling completes
    let result = supervisor.PostAndAsyncReply(Start)
    // Spawn the crawlers.
    let crawlers = 
        [
            for i in 1 .. Gate do
                yield crawler i
        ]

    // Post the first messages.
    crawlers.Head.Post <| Url (Some url)
    crawlers.Tail |> List.iter (fun ag -> ag.Post <| Url None) 
    printfn "[Crawl] before return result"
    result

// Example:
printfn "[Main] before crawl"
crawl "http://news.google.com" 5
|> Async.RunSynchronously
printfn "[Main] after crawl"