如何解决python线程错误'；参数太多'；在我创建的网络爬虫上？_Python

如何解决python线程错误'；参数太多'；在我创建的网络爬虫上？

python

如何解决python线程错误'；参数太多'；在我创建的网络爬虫上？,python,Python,我运行代码时出错它给出了一个线程错误：争论太多我正试图让它通过它找到的每一个链接，这是一个webcrawler。我不知道我做错了什么。如果你有什么想法，请告诉我 def run(links): try: try: import random z = str(links[random.randint(0,len(links))]).replace("(","") q = z.repla

我运行代码时出错

它给出了一个线程错误：

争论太多

我正试图让它通过它找到的每一个链接，这是一个webcrawler。我不知道我做错了什么。如果你有什么想法，请告诉我

    def run(links):
    try:
        try:
            import random
            z = str(links[random.randint(0,len(links))]).replace("(","")
            q = z.replace(")","")
            m = q.replace(", 'http'","")
            m = m[1:len(m)-1]
            line = m
        except:
            with open("sites.txt", "r") as file:
                import random
                line = random.choice(file.readlines())
                print(line)
        with urllib.request.urlopen(line) as response:
            html = response.read()

        #use re.findall to get all the links
        links = re.findall('"((http|ftp)s?://.*?)"', str(html))
        a = 0
        print(links)
        with open("sites.txt", "a") as file:
            while a != len(links):
                z = str(links[a]).replace("(","")
                q = z.replace(")","")
                m = q.replace(", 'http'","")
                m = m[1:len(m)-1]
                l = m
                t = threading.Thread(target=run, args=l)
                t.start()
                file.write(m + "\n")
                a += 1
    except:
        traceback.print_exc()
        pass
import urllib.request
import re
import traceback
import threading
go = False
l = None
t = threading.Thread(target=run, args=l)
while go != True:
    afd = input()
    try:
        with urllib.request.urlopen(afd) as response:
            html = response.read()
            go = True
    except:
        print("Error\nmake sure your using \nhttp://www.\nor https://www.\nor the ip")
with open("sites.txt", "a") as file:
    file.write(afd + "\n")

#use re.findall to get all the links
links = re.findall('"((http|ftp)s?://.*?)"', str(html))
a = 0
print(links)
with open("sites.txt", "a") as file:
    while a != len(links):
        z = str(links[a]).replace("(","")
        q = z.replace(")","")
        m = q.replace(", 'http'","")
        m = m[1:len(m)-1]
        file.write(m + "\n")
        a += 1
while True:
    try:
        try:
            import random
            z = str(links[random.randint(0,len(links))]).replace("(","")
            q = z.replace(")","")
            m = q.replace(", 'http'","")
            m = m[1:len(m)-1]
            line = m
        except:
            with open("sites.txt", "r") as file:
                import random
                line = random.choice(file.readlines())
                print(line)
        with urllib.request.urlopen(line) as response:
            html = response.read()

        #use re.findall to get all the links
        links = re.findall('"((http|ftp)s?://.*?)"', str(html))
        a = 0
        print(links)
        with open("sites.txt", "a") as file:
            while a != len(links):
                z = str(links[a]).replace("(","")
                q = z.replace(")","")
                m = q.replace(", 'http'","")
                m = m[1:len(m)-1]
                l = m
                t = threading.Thread(target=run, args=l)
                t.start()
                file.write(m + "\n")
                a += 1
    except:
        traceback.print_exc()
        pass

创建

线程

对象时，您应该传递参数，而不是作为列表。但是作为一个元组

像这样：

t = threading.Thread(target=run, args=(l,))
t.start()

这将解决您的错误

顺便说一句，因为您正在使用python抓取网页。我强烈建议在创建

线程

对象时使用，而不是以列表的形式传递参数。但是作为一个元组

像这样：

t = threading.Thread(target=run, args=(l,))
t.start()

这将解决您的错误

顺便说一句，因为您正在使用python抓取网页。我强烈建议使用

当我运行此程序时，我会收到很多错误，说“”不是有效的URL，但我可以看出它的某些部分正在工作，文件正在变大。这是另一个新问题。线程正在运行。很可能他们试图下载的一些URL是无效的。是的，这是有道理的。我看了更多的错误，这就是为什么会发生这样的事情。现在它工作得很好！在答案中添加了强烈建议使用

scrapy

。你应该检查它。当我运行此程序时，我收到很多错误，说“”不是有效的URL，但我可以看出它的某些部分正在工作，文件正在变大。这是另一个新问题。线程正在运行。很可能他们试图下载的一些URL是无效的。是的，这是有道理的。我看了更多的错误，这就是为什么会发生这样的事情。现在它工作得很好！在答案中添加了强烈建议使用

scrapy

。你应该去看看。