Recursion 递归Goroutines，告诉Go停止从通道读取的最整洁的方法是什么？_Recursion_Concurrency_Go_Goroutine

Recursion 递归Goroutines，告诉Go停止从通道读取的最整洁的方法是什么？

recursion concurrency go

Recursion 递归Goroutines，告诉Go停止从通道读取的最整洁的方法是什么？,recursion,concurrency,go,goroutine,Recursion,Concurrency,Go,Goroutine,我想知道解决这个问题的惯用方法（目前抛出一个死锁错误），递归分支的次数未知，因此我不能简单地关闭通道我通过传递一个指向一个数字的指针并使其递增，使它工作了，并且我研究了使用Sync waitgroups。我不觉得（也许我错了），我想出了一个优雅的解决方案。我看到的围棋例子往往简单、聪明、简洁这是围棋之旅的最后一个练习你知道“围棋程序员”会如何处理这个问题吗？任何帮助都将不胜感激。我试图从一开始就学好。以下是我对练习的解释。有很多人喜欢它，但这是我的。我使用sync.WaitGroup和一

我想知道解决这个问题的惯用方法（目前抛出一个死锁错误），递归分支的次数未知，因此我不能简单地关闭通道

我通过传递一个指向一个数字的指针并使其递增，使它工作了，并且我研究了使用Sync waitgroups。我不觉得（也许我错了），我想出了一个优雅的解决方案。我看到的围棋例子往往简单、聪明、简洁

这是围棋之旅的最后一个练习

你知道“围棋程序员”会如何处理这个问题吗？任何帮助都将不胜感激。我试图从一开始就学好。

以下是我对练习的解释。有很多人喜欢它，但这是我的。我使用

sync.WaitGroup

和一个自定义的互斥保护映射来存储访问的URL。主要是因为Go的标准

map

类型不是线程安全的。我还将数据和错误通道组合成一个单一的结构，该结构有一种读取所述通道的方法。主要用于分离关注点和（可以说）保持事情更干净

例如：

主程序包
进口(
“fmt”
“同步”
)
类型获取程序接口{
//Fetch返回URL和
//在该页面上找到的URL片段。
获取（url字符串）（正文字符串，url[]字符串，错误）
}
//爬网使用抓取器进行递归爬网
//以url开头的页面，最大深度。
func爬网（wg*sync.WaitGroup、url字符串、深度int、抓取器抓取器、缓存*UrlCache、结果*results）{
推迟工作组完成（）
如果深度而不是涉及sync.WaitGroup
，则可以扩展在解析url上发送的结果，并包括找到的新url的数量。然后在主循环中，只要有需要收集的内容，就可以继续读取结果
在您的情况下，找到的URL数将是生成的go例程数，但不一定必须是。我个人会生成或多或少固定数量的获取例程，这样您就不会打开太多HTTP请求（或者至少您可以控制它）。您的主循环不会改变，因为它不关心如何执行抓取。这里的重要事实是，您需要为每个url发送结果或错误–我在这里修改了代码，因此当深度已经为1时，它不会生成新的例程
此解决方案的一个副作用是，您可以轻松地在主循环中打印进度
以下是操场上的示例：

主程序包
进口(
“fmt”
)
类型获取程序接口{
//Fetch返回URL和
//在该页面上找到的URL片段。
获取（url字符串）（正文字符串，url[]字符串，错误）
}
类型结构{
url字符串
身体线
found int//找到的新URL数
}
//爬网使用抓取器进行递归爬网
//以url开头的页面，最大深度。
func爬网（url字符串、深度int、抓取器抓取器、ch chan Res、errs chan error、访问地图[string]bool）{
正文，url，错误：=fetcher.Fetch（url）
已访问[url]=真
如果错误！=零{
错误1{
对于u，u:=范围URL{
如果！访问过[u]{
新网址++
爬网（u、深度1、抓取器、ch、错误、已访问）
}
}
}
//将结果与要获取的URL数一起发送
下面是我如何解决Go Tour的网络爬虫练习的
为了跟踪并行执行中的递归完成情况，我使用原子整数计数器来跟踪并行递归中爬网的URL数量。在主函数中，我在循环中等待，直到原子计数器减回到零
为了避免再次对同一URL进行爬网，我使用了一个带有互斥的映射来跟踪爬网的URL
下面是同样的代码片段
你可以找到
//安全哈希集版本
类型SafeHashSet结构{
同步互斥
url-map[string]bool//我们主要希望将其用作哈希集，因此map的值对我们来说并不重要
}
变量(
urlSet安全哈希集
urlCounter int64
)
//将URL添加到集合中，如果添加了新URL（如果尚未存在），则返回true
func（m*SafeHashSet）add（newUrl字符串）bool{
m、 锁（）
延迟m.Unlock（）
_，确定：=m.url[newUrl]
如果！好的{
m、 URL[newUrl]=true
返回真值
}
返回错误
}
//爬网使用抓取器进行递归爬网
//以url开头的页面，最大深度。
func爬网（url字符串、深度int、抓取器）{
//当爬网函数退出时，减少原子url计数器
defer-atomic.AddInt64（&urlCounter，-1）
如果深度非常感谢@jimt，这是一个有趣的例子。因此，上述问题的基本答案是sync.WaitGroup。我本以为会有一个解决方案，只涉及前几张幻灯片中所教的内容，但除非任何人都有智慧，否则我认为WaitGroup是一个不错的选择。还有一些其他好的建议，我有一个事实上，我忘记了延迟，这真的很有用。我会暂缓将此标记为答案，只是为了给其他人一个机会，但这肯定是一个很好的答案！你会在Go应用程序中说你的答案更“地道”吗？然后是Tomasz的答案？我不喜欢标记多个答案，但在适当的地方要加上分数。我将此标记为answer，因为虽然我认为@jimt提供的解决方案很优雅，并且教了很多有用的东西，但这是在不依赖以前教程范围以外的工具的情况下完成的。但是，我确实要求提供惯用的“正确答案”，因为我的经验不够丰富，我不知道哪一个是我应该标记的。（试图接受两种答案）
package main

import (
    "fmt"
    "sync"
)

type Fetcher interface {
    // Fetch returns the body of URL and
    // a slice of URLs found on that page.
    Fetch(url string) (body string, urls []string, err error)
}

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(wg *sync.WaitGroup, url string, depth int, fetcher Fetcher, cache *UrlCache, results *Results) {
    defer wg.Done()

    if depth <= 0 || !cache.AtomicSet(url) {
        return
    }

    body, urls, err := fetcher.Fetch(url)
    if err != nil {
        results.Error <- err
        return
    }

    results.Data <- [2]string{url, body}

    for _, url := range urls {
        wg.Add(1)
        go Crawl(wg, url, depth-1, fetcher, cache, results)
    }
}

func main() {
    var wg sync.WaitGroup
    cache := NewUrlCache()

    results := NewResults()
    defer results.Close()

    wg.Add(1)
    go Crawl(&wg, "http://golang.org/", 4, fetcher, cache, results)
    go results.Read()
    wg.Wait()
}

// Results defines channels which yield results for a single crawled URL.
type Results struct {
    Data  chan [2]string // url + body.
    Error chan error     // Possible fetcher error.
}

func NewResults() *Results {
    return &Results{
        Data:  make(chan [2]string, 1),
        Error: make(chan error, 1),
    }
}

func (r *Results) Close() error {
    close(r.Data)
    close(r.Error)
    return nil
}

// Read reads crawled results or errors, for as long as the channels are open.
func (r *Results) Read() {
    for {
        select {
        case data := <-r.Data:
            fmt.Println(">", data)

        case err := <-r.Error:
            fmt.Println("e", err)
        }
    }
}

// UrlCache defines a cache of URL's we've already visited.
type UrlCache struct {
    sync.Mutex
    data map[string]struct{} // Empty struct occupies 0 bytes, whereas bool takes 1 bytes.
}

func NewUrlCache() *UrlCache { return &UrlCache{data: make(map[string]struct{})} }

// AtomicSet sets the given url in the cache and returns false if it already existed.
//
// All within the same locked context. Modifying a map without synchronisation is not safe
// when done from multiple goroutines. Doing a Exists() check and Set() separately will
// create a race condition, so we must combine both in a single operation.
func (c *UrlCache) AtomicSet(url string) bool {
    c.Lock()
    _, ok := c.data[url]
    c.data[url] = struct{}{}
    c.Unlock()
    return !ok
}

// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult

type fakeResult struct {
    body string
    urls []string
}

func (f fakeFetcher) Fetch(url string) (string, []string, error) {
    if res, ok := f[url]; ok {
        return res.body, res.urls, nil
    }
    return "", nil, fmt.Errorf("not found: %s", url)
}

// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
    "http://golang.org/": &fakeResult{
        "The Go Programming Language",
        []string{
            "http://golang.org/pkg/",
            "http://golang.org/cmd/",
        },
    },
    "http://golang.org/pkg/": &fakeResult{
        "Packages",
        []string{
            "http://golang.org/",
            "http://golang.org/cmd/",
            "http://golang.org/pkg/fmt/",
            "http://golang.org/pkg/os/",
        },
    },
    "http://golang.org/pkg/fmt/": &fakeResult{
        "Package fmt",
        []string{
            "http://golang.org/",
            "http://golang.org/pkg/",
        },
    },
    "http://golang.org/pkg/os/": &fakeResult{
        "Package os",
        []string{
            "http://golang.org/",
            "http://golang.org/pkg/",
        },
    },
}

package main

import (
    "fmt"
)

type Fetcher interface {
    // Fetch returns the body of URL and
    // a slice of URLs found on that page.
    Fetch(url string) (body string, urls []string, err error)
}

type Res struct {
    url string
    body string
    found int // Number of new urls found
}

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, ch chan Res, errs chan error, visited map[string]bool) {
    body, urls, err := fetcher.Fetch(url)
    visited[url] = true
    if err != nil {
        errs <- err
        return
    }

    newUrls := 0    
    if depth > 1 {
        for _, u := range urls {
            if !visited[u] {
                newUrls++
                go Crawl(u, depth-1, fetcher, ch, errs, visited)
            }
        }
    }

    // Send the result along with number of urls to be fetched
    ch <- Res{url, body, newUrls}

    return
}

func main() {
    ch := make(chan Res)
    errs := make(chan error)
    visited := map[string]bool{}
    go Crawl("http://golang.org/", 4, fetcher, ch, errs, visited)
    tocollect := 1
    for n := 0; n < tocollect; n++ {
        select {
        case s := <-ch:
            fmt.Printf("found: %s %q\n", s.url, s.body)
            tocollect += s.found
        case e := <-errs:
            fmt.Println(e)
        }
    }

}

// fakeFetcher is Fetcher that returns canned results.
type fakeFetcher map[string]*fakeResult

type fakeResult struct {
    body string
    urls []string
}

func (f fakeFetcher) Fetch(url string) (string, []string, error) {
    if res, ok := f[url]; ok {
        return res.body, res.urls, nil
    }
    return "", nil, fmt.Errorf("not found: %s", url)
}

// fetcher is a populated fakeFetcher.
var fetcher = fakeFetcher{
    "http://golang.org/": &fakeResult{
        "The Go Programming Language",
        []string{
            "http://golang.org/pkg/",
            "http://golang.org/cmd/",
        },
    },
    "http://golang.org/pkg/": &fakeResult{
        "Packages",
        []string{
            "http://golang.org/",
            "http://golang.org/cmd/",
            "http://golang.org/pkg/fmt/",
            "http://golang.org/pkg/os/",
        },
    },
    "http://golang.org/pkg/fmt/": &fakeResult{
        "Package fmt",
        []string{
            "http://golang.org/",
            "http://golang.org/pkg/",
        },
    },
    "http://golang.org/pkg/os/": &fakeResult{
        "Package os",
        []string{
            "http://golang.org/",
            "http://golang.org/pkg/",
        },
    },
}

// Safe HashSet Version
type SafeHashSet struct {
    sync.Mutex
    urls map[string]bool //Primarily we wanted use this as an hashset, so the value of map is not significant to us
}

var (
    urlSet     SafeHashSet
    urlCounter int64
)

// Adds an URL to the Set, returns true if new url was added (if not present already)
func (m *SafeHashSet) add(newUrl string) bool {
    m.Lock()
    defer m.Unlock()
    _, ok := m.urls[newUrl]
    if !ok {
        m.urls[newUrl] = true
        return true
    }
    return false
}


// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {

    // Decrement the atomic url counter, when this crawl function exits
    defer atomic.AddInt64(&urlCounter, -1)

    if depth <= 0 {
        return
    }

    // Don't Process a url if it is already processed
    isNewUrl := urlSet.add(url)

    if !isNewUrl {
        fmt.Printf("skip: \t%s\n", url)
        return
    }


    body, urls, err := fetcher.Fetch(url)
    if err != nil {
        fmt.Println(err)
        return
    }
    fmt.Printf("found: \t%s %q\n", url, body)

    for _, u := range urls {
        atomic.AddInt64(&urlCounter, 1)
        // Crawl parallely
        go Crawl(u, depth-1, fetcher)
    }
    return
}

func main() {
    urlSet = SafeHashSet{urls: make(map[string]bool)}

    atomic.AddInt64(&urlCounter, 1)
    go Crawl("https://golang.org/", 4, fetcher)

    for atomic.LoadInt64(&urlCounter) > 0 {
        time.Sleep(100 * time.Microsecond)
    }
    fmt.Println("Exiting")
}