Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/284.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Can';t在python中使用多线程读取/写入文件_Python_Python 3.x_Multithreading_Python Multiprocessing_Python Multithreading - Fatal编程技术网

Can';t在python中使用多线程读取/写入文件

Can';t在python中使用多线程读取/写入文件,python,python-3.x,multithreading,python-multiprocessing,python-multithreading,Python,Python 3.x,Multithreading,Python Multiprocessing,Python Multithreading,我有一个输入文件,其中包含一长串URL。让我们在mylines.txt中假设这一点: https://yahoo.com https://google.com https://facebook.com https://twitter.com 我需要做的是: 从输入文件mylines.txt 执行myFun功能。它将执行一些任务。并生成由一行组成的输出。在我的真实代码中,它更复杂。但概念上是这样的 将输出写入results.txt文件 因为我有大量的投入。我需要利用python多线程。我看着这个

我有一个输入文件,其中包含一长串URL。让我们在
mylines.txt
中假设这一点:

https://yahoo.com
https://google.com
https://facebook.com
https://twitter.com
我需要做的是:

  • 从输入文件
    mylines.txt

  • 执行
    myFun
    功能。它将执行一些任务。并生成由一行组成的输出。在我的真实代码中,它更复杂。但概念上是这样的

  • 将输出写入
    results.txt
    文件

  • 因为我有大量的投入。我需要利用python多线程。我看着这个好。但不幸的是,它假定输入在一个简单的列表中,而不假定我想将函数的输出写入一个文件中

    我需要确保每个输入的输出都写在一行中(也就是说,如果多线程写在同一行中,那么我会得到不正确的数据,这是一种危险)

    我试图胡闹。但是没有成功。我以前没有使用python的多线程,但现在是学习的时候了,因为在我的情况下这是不可避免的。我有一个很长的列表,如果没有多线程,它无法在合理的时间内完成。我的功能将不做这个简单的任务,但更多的操作是不必要的概念

    这是我的尝试。请纠正我(代码本身):

    问:如何修复上述代码(请简洁,并帮助我了解代码本身),从输入文件中读取一行,执行函数,使用python多线程将与输入相关的结果写入一行,以同时执行
    请求
    ,以便我能在合理的时间内完成列表

    更新:

    根据答案,代码变为:

    import threading
    import requests
    from multiprocessing.dummy import Pool as ThreadPool
    import queue
    from multiprocessing import Queue
    
    def myFunc(url):
        response = requests.get(url, verify=False ,timeout=(2, 5))
        return "url is:" + url + ", response is:" + response.url
    
    worker_data = open("mylines.txt","r") # open my input file.
    
    #load up a queue with your data, this will handle locking
    q = queue.Queue(4)
    with open("mylines.txt","r") as f: # open my input file.
        for url in f:
            q.put(url)
    
    # make the Pool of workers
    pool = ThreadPool(4)
    results = pool.map(myFunc, q)
    
    with open("myresults","w") as f:
        for line in results:
            f.write(line + '\n')
    
    mylines.txt包含:

    https://yahoo.com
    https://www.google.com
    https://facebook.com
    https://twitter.com
    
    请注意,我首先使用的是:

    import Queue
    
    以及: q=队列。队列(4)

    但有一个错误是:

    Traceback (most recent call last):
      File "test3.py", line 4, in <module>
        import Queue
    ModuleNotFoundError: No module named 'Queue'
    
    而有关的路线是: q=队列。队列(4)

    我还补充说:

    from multiprocessing import Queue
    

    但什么都不管用。任何python多线程专家都能提供帮助吗?

    您应该更改函数以返回字符串:

    def myFunc(url):
        response = requests.get(url, verify=False ,timeout=(2, 5))
        return "url is:" + url + ", response is:" + response.url
    
    然后将这些字符串写入文件:

    results = pool.map(myFunc, q)
    
    with open("myresults","w") as f:
        for line in results:
            f.write(line + '\n')
    
    这会使多线程处理对
    请求.get
    有效,但会将结果串行写入输出文件

    更新:

    您还应使用
    with
    读取输入文件:

    #load up a queue with your data, this will handle locking
    q = Queue.Queue()
    
    with open("mylines.txt","r") as f: # open my input file.
        for url in f:
            q.put(url)
    

    与其让工作池线程打印出结果(这不能保证正确缓冲输出),不如再创建一个线程,从第二个
    队列读取结果并打印它们

    我已经修改了您的解决方案,因此它可以构建自己的工作线程池。为队列指定有限长度没有什么意义,因为当队列达到最大大小时,主线程将阻塞:您只需要它足够长,以确保工作线程始终有工作要处理-主线程将随着队列大小的增加和减少而阻塞和解除阻塞

    它还标识了负责输出队列中每个项目的线程,这将使您对多线程方法正在工作有一定的信心,并从服务器打印响应代码。我发现我不得不从URL中删除换行符

    因为现在只有一个线程在向文件写入,所以写入操作总是完全同步的,不会相互干扰

    import threading
    import requests
    import queue
    POOL_SIZE = 4
    
    def myFunc(inq, outq):  # worker thread deals only with queues
        while True:
            url = inq.get()  # Blocks until something available
            if url is None:
                break
            response = requests.get(url.strip(), timeout=(2, 5))
            outq.put((url, response, threading.currentThread().name))
    
    
    class Writer(threading.Thread):
        def __init__(self, q):
            super().__init__()
            self.results = open("myresults","a") # "a" to append results
            self.queue = q
        def run(self):
            while True:
                url, response, threadname = self.queue.get()
                if response is None:
                    self.results.close()
                    break
                print("****url is:",url, ", response is:", response.status_code, response.url, "thread", threadname, file=self.results)
    
    #load up a queue with your data, this will handle locking
    inq = queue.Queue()  # could usefully limit queue size here
    outq = queue.Queue()
    
    # start the Writer
    writer = Writer(outq)
    writer.start()
    
    # make the Pool of workers
    threads = []
    for i in range(POOL_SIZE):
        thread = threading.Thread(target=myFunc, name=f"worker{i}", args=(inq, outq))
        thread.start()
        threads.append(thread)
    
    # push the work onto the queues
    with open("mylines.txt","r") as worker_data: # open my input file.
        for url in worker_data:
            inq.put(url.strip())
    for thread in threads:
        inq.put(None)
    
    # close the pool and wait for the workers to finish
    for thread in threads:
        thread.join()
    
    # Terminate the writer
    outq.put((None, None, None))
    writer.join()
    
    使用
    mylines.txt
    中给出的数据,我可以看到以下输出:

    ****url is: https://www.google.com , response is: 200 https://www.google.com/ thread worker1
    ****url is: https://twitter.com , response is: 200 https://twitter.com/ thread worker2
    ****url is: https://facebook.com , response is: 200 https://www.facebook.com/ thread worker0
    ****url is: https://www.censys.io , response is: 200 https://censys.io/ thread worker1
    ****url is: https://yahoo.com , response is: 200 https://uk.yahoo.com/?p=us thread worker3
    

    @philshem她是个问题。谢谢你的帮助。文件中有数百万行可以使用吗?这是另一个问题。也许您可以收集所有这些代码,确保它适用于小的输入文件,然后尝试用于更大的文件。如果有问题,那么你可以把你的症状作为一个新问题发布。我做了建议的修改。我得到:
    ModuleNotFoundError:没有名为“Queue”的模块
    ,原因不清楚?我的原始代码不起作用。我根据您的建议更新了代码,并尝试了一些解决队列问题的解决方案。我最终没有错误,没有挂起,也没有输出。我使用Python3.6,一些帖子说它是3.6中的小写字母(queue)。您是否在自己这边运行过它?它永远伴随着我。光标只是闪烁,它挂起了。单击CTRL+C退出后,我得到了以下信息:
    ^CEException在:回溯(最近一次调用):文件“/usr/lib/python3.6/threading.py”,第1294行,在join self的“shutdown t t.join()”文件“/usr/lib/python3.6/threading.py”第1056行中被忽略。等待状态锁定()文件“/usr/lib/python3.6/threading.py”,第1072行,在等待状态锁elif lock.acquire中(块,超时):KeyboardInterrupt
    我正在测试问题中输入的5行代码。我使用
    python3
    命令和Ubunut 18.04中的系统。我看到了创建的输出文件,但没有写入任何内容。python程序永远不会结束。即使没有指示器,光标也应该停止,但现在只是闪烁。它似乎可以工作。可以确保。但我的问题是复制不正确。它运行但不正确。不应重复输出。它只应执行请求。在文件中读取每个URL时获取一次。为什么重复?
    import threading
    import requests
    import queue
    POOL_SIZE = 4
    
    def myFunc(inq, outq):  # worker thread deals only with queues
        while True:
            url = inq.get()  # Blocks until something available
            if url is None:
                break
            response = requests.get(url.strip(), timeout=(2, 5))
            outq.put((url, response, threading.currentThread().name))
    
    
    class Writer(threading.Thread):
        def __init__(self, q):
            super().__init__()
            self.results = open("myresults","a") # "a" to append results
            self.queue = q
        def run(self):
            while True:
                url, response, threadname = self.queue.get()
                if response is None:
                    self.results.close()
                    break
                print("****url is:",url, ", response is:", response.status_code, response.url, "thread", threadname, file=self.results)
    
    #load up a queue with your data, this will handle locking
    inq = queue.Queue()  # could usefully limit queue size here
    outq = queue.Queue()
    
    # start the Writer
    writer = Writer(outq)
    writer.start()
    
    # make the Pool of workers
    threads = []
    for i in range(POOL_SIZE):
        thread = threading.Thread(target=myFunc, name=f"worker{i}", args=(inq, outq))
        thread.start()
        threads.append(thread)
    
    # push the work onto the queues
    with open("mylines.txt","r") as worker_data: # open my input file.
        for url in worker_data:
            inq.put(url.strip())
    for thread in threads:
        inq.put(None)
    
    # close the pool and wait for the workers to finish
    for thread in threads:
        thread.join()
    
    # Terminate the writer
    outq.put((None, None, None))
    writer.join()
    
    ****url is: https://www.google.com , response is: 200 https://www.google.com/ thread worker1
    ****url is: https://twitter.com , response is: 200 https://twitter.com/ thread worker2
    ****url is: https://facebook.com , response is: 200 https://www.facebook.com/ thread worker0
    ****url is: https://www.censys.io , response is: 200 https://censys.io/ thread worker1
    ****url is: https://yahoo.com , response is: 200 https://uk.yahoo.com/?p=us thread worker3