Multithreading 通过golang刮取gitlab的问题
我是编程新手,需要帮助。试图在golang上编写gitlab scraper。 当我试图以多线程模式获取有关项目的信息时,出现了一些问题 代码如下:Multithreading 通过golang刮取gitlab的问题,multithreading,go,web-scraping,gitlab,Multithreading,Go,Web Scraping,Gitlab,我是编程新手,需要帮助。试图在golang上编写gitlab scraper。 当我试图以多线程模式获取有关项目的信息时,出现了一些问题 代码如下: func (g *Gitlab) getAPIResponce(url string, structure interface{}) error { responce, responce_error := http.Get(url) if responce_error != nil { return responce
func (g *Gitlab) getAPIResponce(url string, structure interface{}) error {
responce, responce_error := http.Get(url)
if responce_error != nil {
return responce_error
}
ret, _ := ioutil.ReadAll(responce.Body)
if string(ret) != "[]" {
err := json.Unmarshal(ret, structure)
return err
}
return errors.New(error_emptypage)
}
...
func (g *Gitlab) GetProjects() {
projects_chan := make(chan Project, g.LatestProjectID)
var waitGroup sync.WaitGroup
queue := make(chan struct{}, 50)
for i := g.LatestProjectID; i > 0; i-- {
url := g.BaseURL + projects_url + "/" + strconv.Itoa(i) + g.Token
waitGroup.Add(1)
go func(url string, channel chan Project) {
queue <- struct{}{}
defer waitGroup.Done()
var oneProject Project
err := g.getAPIResponce(url, &oneProject)
if err != nil {
fmt.Println(err.Error())
}
fmt.Printf(".")
channel <- oneProject
<-queue
}(url, projects_chan)
}
go func() {
waitGroup.Wait()
close(projects_chan)
}()
for project := range projects_chan {
if project.ID != 0 {
g.Projects = append(g.Projects, project)
}
}
}
每次都是不同的项目,但它的id大约是550
当我试图从输出中卷曲链接时,我得到的是普通的JSON。当我试图使用queue:=make(chan struct{},1)
(在单线程中)运行此代码时,一切正常
这可能是什么?我想说,这不是实现并发性的一种非常明确的方法。 这里发生的似乎是
- 创建大小为50的缓冲通道
- 然后你发射了1532次goroutines
- 前50名员工将自己排队并开始处理。当他们试图限制同时连接的数量时,我正在尝试这样做:
(N=连接数量),这部分有帮助,但我正在失去性能。这是gitlab、golang还是我的电脑的问题?我想50美元太多了。可能存在某种DDoS保护,限制连接数量。试着少做一些,比如说5或10。看起来你是对的,这很好(但是慢慢地-1500多个项目,42秒):queue:=make(chan struct{},N)
和projects\u chan:=make(chan Project,2)
您可以找到最佳数量或使用一些代理,这样您的请求看起来就像来自不同的来源。感谢您的回答,特别是感谢“go并发模式”的链接。根据Eugene Lisitsky的回答,我已经通过限制线程数量和调优Gitlab解决了这个问题queue:=make(chan struct{},3)
$ ./gitlab-auditor latest project = 1532 Gathering projects... .......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Get https://gitlab.example.com/api/v4/projects/563&private_token=SeCrEt_ToKeN: unexpected EOF Get https://gitlab.example.com/api/v4/projects/558&private_token=SeCrEt_ToKeN: unexpected EOF ..Get https://gitlab.example.com/api/v4/projects/531&private_token=SeCrEt_ToKeN: unexpected EOF Get https://gitlab.example.com/api/v4/projects/571&private_token=SeCrEt_ToKeN: unexpected EOF .Get https://gitlab.example.com/api/v4/projects/570&private_token=SeCrEt_ToKeN: unexpected EOF ..Get https://gitlab.example.com/api/v4/projects/467&private_token=SeCrEt_ToKeN: unexpected EOF Get https://gitlab.example.com/api/v4/projects/573&private_token=SeCrEt_ToKeN: unexpected EOF ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................