Can'；t在python中使用多线程读取/写入文件_Python_Python 3.x_Multithreading_Python Multiprocessing_Python Multithreading

Can'；t在python中使用多线程读取/写入文件

python python-3.x multithreading

Can'；t在python中使用多线程读取/写入文件,python,python-3.x,multithreading,python-multiprocessing,python-multithreading,Python,Python 3.x,Multithreading,Python Multiprocessing,Python Multithreading,我有一个输入文件，其中包含一长串URL。让我们在mylines.txt中假设这一点： https://yahoo.com https://google.com https://facebook.com https://twitter.com 我需要做的是：从输入文件mylines.txt 执行myFun功能。它将执行一些任务。并生成由一行组成的输出。在我的真实代码中，它更复杂。但概念上是这样的将输出写入results.txt文件因为我有大量的投入。我需要利用python多线程。我看着这个

我有一个输入文件，其中包含一长串URL。让我们在

mylines.txt

中假设这一点：

https://yahoo.com
https://google.com
https://facebook.com
https://twitter.com

我需要做的是：

从输入文件

mylines.txt

执行

myFun

功能。它将执行一些任务。并生成由一行组成的输出。在我的真实代码中，它更复杂。但概念上是这样的

将输出写入

results.txt

文件

因为我有大量的投入。我需要利用python多线程。我看着这个好。但不幸的是，它假定输入在一个简单的列表中，而不假定我想将函数的输出写入一个文件中

我需要确保每个输入的输出都写在一行中（也就是说，如果多线程写在同一行中，那么我会得到不正确的数据，这是一种危险）

我试图胡闹。但是没有成功。我以前没有使用python的多线程，但现在是学习的时候了，因为在我的情况下这是不可避免的。我有一个很长的列表，如果没有多线程，它无法在合理的时间内完成。我的功能将不做这个简单的任务，但更多的操作是不必要的概念

这是我的尝试。请纠正我（代码本身）：

问：如何修复上述代码（请简洁，并帮助我了解代码本身），从输入文件中读取一行，执行函数，使用python多线程将与输入相关的结果写入一行，以同时执行

请求

，以便我能在合理的时间内完成列表

更新：

根据答案，代码变为：

import threading
import requests
from multiprocessing.dummy import Pool as ThreadPool
import queue
from multiprocessing import Queue

def myFunc(url):
    response = requests.get(url, verify=False ,timeout=(2, 5))
    return "url is:" + url + ", response is:" + response.url

worker_data = open("mylines.txt","r") # open my input file.

#load up a queue with your data, this will handle locking
q = queue.Queue(4)
with open("mylines.txt","r") as f: # open my input file.
    for url in f:
        q.put(url)

# make the Pool of workers
pool = ThreadPool(4)
results = pool.map(myFunc, q)

with open("myresults","w") as f:
    for line in results:
        f.write(line + '\n')

mylines.txt包含：

https://yahoo.com
https://www.google.com
https://facebook.com
https://twitter.com

请注意，我首先使用的是：

import Queue

以及： q=队列。队列（4）

但有一个错误是：

Traceback (most recent call last):
  File "test3.py", line 4, in <module>
    import Queue
ModuleNotFoundError: No module named 'Queue'

而有关的路线是： q=队列。队列（4）

我还补充说：

from multiprocessing import Queue

但什么都不管用。任何python多线程专家都能提供帮助吗？

您应该更改函数以返回字符串：

def myFunc(url):
    response = requests.get(url, verify=False ,timeout=(2, 5))
    return "url is:" + url + ", response is:" + response.url

然后将这些字符串写入文件：

results = pool.map(myFunc, q)

with open("myresults","w") as f:
    for line in results:
        f.write(line + '\n')

这会使多线程处理对

请求.get

有效，但会将结果串行写入输出文件

更新：

您还应使用

with

读取输入文件：

#load up a queue with your data, this will handle locking
q = Queue.Queue()

with open("mylines.txt","r") as f: # open my input file.
    for url in f:
        q.put(url)

与其让工作池线程打印出结果（这不能保证正确缓冲输出），不如再创建一个线程，从第二个

队列读取结果并打印它们
我已经修改了您的解决方案，因此它可以构建自己的工作线程池。为队列指定有限长度没有什么意义，因为当队列达到最大大小时，主线程将阻塞：您只需要它足够长，以确保工作线程始终有工作要处理-主线程将随着队列大小的增加和减少而阻塞和解除阻塞
它还标识了负责输出队列中每个项目的线程，这将使您对多线程方法正在工作有一定的信心，并从服务器打印响应代码。我发现我不得不从URL中删除换行符
因为现在只有一个线程在向文件写入，所以写入操作总是完全同步的，不会相互干扰
import threading
import requests
import queue
POOL_SIZE = 4

def myFunc(inq, outq):  # worker thread deals only with queues
    while True:
        url = inq.get()  # Blocks until something available
        if url is None:
            break
        response = requests.get(url.strip(), timeout=(2, 5))
        outq.put((url, response, threading.currentThread().name))


class Writer(threading.Thread):
    def __init__(self, q):
        super().__init__()
        self.results = open("myresults","a") # "a" to append results
        self.queue = q
    def run(self):
        while True:
            url, response, threadname = self.queue.get()
            if response is None:
                self.results.close()
                break
            print("****url is:",url, ", response is:", response.status_code, response.url, "thread", threadname, file=self.results)

#load up a queue with your data, this will handle locking
inq = queue.Queue()  # could usefully limit queue size here
outq = queue.Queue()

# start the Writer
writer = Writer(outq)
writer.start()

# make the Pool of workers
threads = []
for i in range(POOL_SIZE):
    thread = threading.Thread(target=myFunc, name=f"worker{i}", args=(inq, outq))
    thread.start()
    threads.append(thread)

# push the work onto the queues
with open("mylines.txt","r") as worker_data: # open my input file.
    for url in worker_data:
        inq.put(url.strip())
for thread in threads:
    inq.put(None)

# close the pool and wait for the workers to finish
for thread in threads:
    thread.join()

# Terminate the writer
outq.put((None, None, None))
writer.join()

使用mylines.txt
中给出的数据，我可以看到以下输出：
****url is: https://www.google.com , response is: 200 https://www.google.com/ thread worker1
****url is: https://twitter.com , response is: 200 https://twitter.com/ thread worker2
****url is: https://facebook.com , response is: 200 https://www.facebook.com/ thread worker0
****url is: https://www.censys.io , response is: 200 https://censys.io/ thread worker1
****url is: https://yahoo.com , response is: 200 https://uk.yahoo.com/?p=us thread worker3

@philshem她是个问题。谢谢你的帮助。文件中有数百万行可以使用吗？这是另一个问题。也许您可以收集所有这些代码，确保它适用于小的输入文件，然后尝试用于更大的文件。如果有问题，那么你可以把你的症状作为一个新问题发布。我做了建议的修改。我得到：ModuleNotFoundError:没有名为“Queue”的模块
，原因不清楚？我的原始代码不起作用。我根据您的建议更新了代码，并尝试了一些解决队列问题的解决方案。我最终没有错误，没有挂起，也没有输出。我使用Python3.6，一些帖子说它是3.6中的小写字母（queue）。您是否在自己这边运行过它？它永远伴随着我。光标只是闪烁，它挂起了。单击CTRL+C退出后，我得到了以下信息：^CEException在：回溯（最近一次调用）：文件“/usr/lib/python3.6/threading.py”，第1294行，在join self的“shutdown t t.join（）”文件“/usr/lib/python3.6/threading.py”第1056行中被忽略。等待状态锁定（）文件“/usr/lib/python3.6/threading.py”，第1072行，在等待状态锁elif lock.acquire中（块，超时）：KeyboardInterrupt
我正在测试问题中输入的5行代码。我使用python3命令和Ubunut 18.04中的系统。我看到了创建的输出文件，但没有写入任何内容。python程序永远不会结束。即使没有指示器，光标也应该停止，但现在只是闪烁。它似乎可以工作。可以确保。但我的问题是复制不正确。它运行但不正确。不应重复输出。它只应执行请求。在文件中读取每个URL时获取一次。为什么重复？
import threading
import requests
import queue
POOL_SIZE = 4

def myFunc(inq, outq):  # worker thread deals only with queues
    while True:
        url = inq.get()  # Blocks until something available
        if url is None:
            break
        response = requests.get(url.strip(), timeout=(2, 5))
        outq.put((url, response, threading.currentThread().name))


class Writer(threading.Thread):
    def __init__(self, q):
        super().__init__()
        self.results = open("myresults","a") # "a" to append results
        self.queue = q
    def run(self):
        while True:
            url, response, threadname = self.queue.get()
            if response is None:
                self.results.close()
                break
            print("****url is:",url, ", response is:", response.status_code, response.url, "thread", threadname, file=self.results)

#load up a queue with your data, this will handle locking
inq = queue.Queue()  # could usefully limit queue size here
outq = queue.Queue()

# start the Writer
writer = Writer(outq)
writer.start()

# make the Pool of workers
threads = []
for i in range(POOL_SIZE):
    thread = threading.Thread(target=myFunc, name=f"worker{i}", args=(inq, outq))
    thread.start()
    threads.append(thread)

# push the work onto the queues
with open("mylines.txt","r") as worker_data: # open my input file.
    for url in worker_data:
        inq.put(url.strip())
for thread in threads:
    inq.put(None)

# close the pool and wait for the workers to finish
for thread in threads:
    thread.join()

# Terminate the writer
outq.put((None, None, None))
writer.join()

****url is: https://www.google.com , response is: 200 https://www.google.com/ thread worker1
****url is: https://twitter.com , response is: 200 https://twitter.com/ thread worker2
****url is: https://facebook.com , response is: 200 https://www.facebook.com/ thread worker0
****url is: https://www.censys.io , response is: 200 https://censys.io/ thread worker1
****url is: https://yahoo.com , response is: 200 https://uk.yahoo.com/?p=us thread worker3