Multithreading 通过golang刮取gitlab的问题

Multithreading 通过golang刮取gitlab的问题,multithreading,go,web-scraping,gitlab,Multithreading,Go,Web Scraping,Gitlab,我是编程新手,需要帮助。试图在golang上编写gitlab scraper。 当我试图以多线程模式获取有关项目的信息时,出现了一些问题 代码如下: func (g *Gitlab) getAPIResponce(url string, structure interface{}) error { responce, responce_error := http.Get(url) if responce_error != nil { return responce

我是编程新手,需要帮助。试图在golang上编写gitlab scraper。 当我试图以多线程模式获取有关项目的信息时,出现了一些问题

代码如下:

func (g *Gitlab) getAPIResponce(url string, structure interface{}) error {
    responce, responce_error := http.Get(url)
    if responce_error != nil {
        return responce_error
    }
    ret, _ := ioutil.ReadAll(responce.Body)
    if string(ret) != "[]" {
        err := json.Unmarshal(ret, structure)
        return err
    }
    return errors.New(error_emptypage)
}

...

func (g *Gitlab) GetProjects() {
    projects_chan := make(chan Project, g.LatestProjectID) 
    var waitGroup sync.WaitGroup                           
    queue := make(chan struct{}, 50)                                      
    for i := g.LatestProjectID; i > 0; i-- {               
        url := g.BaseURL + projects_url + "/" + strconv.Itoa(i) + g.Token
        waitGroup.Add(1)
        go func(url string, channel chan Project) {
            queue <- struct{}{}
            defer waitGroup.Done()

            var oneProject Project
            err := g.getAPIResponce(url, &oneProject)
            if err != nil {
                fmt.Println(err.Error())
            }

            fmt.Printf(".")
            channel <- oneProject
            <-queue
        }(url, projects_chan)
    }

    go func() {
        waitGroup.Wait()
        close(projects_chan)
    }()

    for project := range projects_chan {
        if project.ID != 0 {
            g.Projects = append(g.Projects, project)
        }
    }
}
每次都是不同的项目,但它的id大约是550

当我试图从输出中卷曲链接时,我得到的是普通的JSON。当我试图使用
queue:=make(chan struct{},1)
(在单线程中)运行此代码时,一切正常


这可能是什么?

我想说,这不是实现并发性的一种非常明确的方法。 这里发生的似乎是

  • 创建大小为50的缓冲通道

  • 然后你发射了1532次goroutines


  • 前50名员工将自己排队并开始处理。当他们试图限制同时连接的数量时,我正在尝试这样做:
    queue:=make(chan struct{},N)
    (N=连接数量),这部分有帮助,但我正在失去性能。这是gitlab、golang还是我的电脑的问题?我想50美元太多了。可能存在某种DDoS保护,限制连接数量。试着少做一些,比如说5或10。看起来你是对的,这很好(但是慢慢地-1500多个项目,42秒):
    projects\u chan:=make(chan Project,2)
    queue:=make(chan struct{},3)
    您可以找到最佳数量或使用一些代理,这样您的请求看起来就像来自不同的来源。感谢您的回答,特别是感谢“go并发模式”的链接。根据Eugene Lisitsky的回答,我已经通过限制线程数量和调优Gitlab解决了这个问题
    $ ./gitlab-auditor 
    latest project = 1532
    Gathering projects...
    .......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Get https://gitlab.example.com/api/v4/projects/563&private_token=SeCrEt_ToKeN: unexpected EOF
    Get https://gitlab.example.com/api/v4/projects/558&private_token=SeCrEt_ToKeN: unexpected EOF
    ..Get https://gitlab.example.com/api/v4/projects/531&private_token=SeCrEt_ToKeN: unexpected EOF
    Get https://gitlab.example.com/api/v4/projects/571&private_token=SeCrEt_ToKeN: unexpected EOF
    .Get https://gitlab.example.com/api/v4/projects/570&private_token=SeCrEt_ToKeN: unexpected EOF
    ..Get https://gitlab.example.com/api/v4/projects/467&private_token=SeCrEt_ToKeN: unexpected EOF
    Get https://gitlab.example.com/api/v4/projects/573&private_token=SeCrEt_ToKeN: unexpected EOF
