Python 文件中的多处理读取行
所以我有一个程序可以检查列表中的代理,但是它非常慢,所以我添加了多处理。但我的问题是,当我运行程序时,它只读取文本文件中的第一行,但当我运行代码而不进行多重处理时,它会向下读取文件中的行。IDK我认为这与Python 文件中的多处理读取行,python,python-3.x,python-requests,multiprocessing,Python,Python 3.x,Python Requests,Multiprocessing,所以我有一个程序可以检查列表中的代理,但是它非常慢,所以我添加了多处理。但我的问题是,当我运行程序时,它只读取文本文件中的第一行,但当我运行代码而不进行多重处理时,它会向下读取文件中的行。IDK我认为这与{proxies=file.readline()}有关 我认为python可能不擅长跨线程多路复用文件对象。我简化并更改了您的代码,似乎效果更好: import multiprocessing file = open("test.txt",'r') def check(
{proxies=file.readline()}
有关
我认为python可能不擅长跨线程多路复用文件对象。我简化并更改了您的代码,似乎效果更好:
import multiprocessing
file = open("test.txt",'r')
def check(proxies):
print(proxies)
if __name__ == '__main__':
while True:
proxies = file.readline()
p = multiprocessing.Process(target=check, args=(proxies,))
p.start()
其中test.txt是我制作的一个示例文件:
test
asdf
1
2
3
4
此代码似乎正确地处理了文件的所有行(尽管顺序不正确):
您仍然需要一种方法来停止循环,而我在这段代码中没有这样做
在我的版本中,我以串行方式读取文件,但仍然以多线程方式处理文件。我读取循环外部的文件,并将结果行作为参数传递给线程。这可能没有你想要的那么快,但我不知道如何做得更快。它应该仍然非常快,因为(当您集成我的更改时)它不会在启动另一个请求之前等待响应。因为发出http网络请求是一个I/O绑定操作,您可能应该使用,因为后者对CPU绑定操作更友好,而您现在不这么做
由于多个进程彼此独立(除非您使用队列、共享内存位置或文件),因此每个进程都会获得一个文件句柄,并在没有任何意识的情况下读取第一行
将函数更改为获取行条目,这样每个进程可以获取一个文件名:
def check(proxies):
proxys = {'http': kind + '://' + proxies, 'https': kind + '://' + proxies}
url = ('http://checkip.dyndns.com/')
try:
response = requests.get(url, timeout = 2.5, proxies = proxys)
except requests.exceptions.Timeout:
print('Bad', proxies)
except requests.exceptions.ConnectionError:
print('Network problem', proxies)
else:
print('Good', proxies, 'Response time', response.elapsed)
# "with" closes the filehandle when done.
with open('goods.txt', 'a+') as files:
files.write('\n' + proxies)
if __name__ == '__main__':
with open("SOCKS4.txt",'r') as file_handle: # "with" closes the filehandle when it is done
# iterates through each line of the file
for line in file_handle:
p = multiprocessing.Process(target=check, args=(line,)) # feed each line to the function
p.start()
我建议将文件中的所有行放入队列,让子进程从队列中选择行。它类似于@salparadise的解决方案,但您只生成一次新流程。如下所示:
def check(queue):
for line in iter(queue.get, 'STOP'):
proxys = {'http': kind + '://' + line, 'https': kind + '://' + line}
url = ('http://checkip.dyndns.com/')
try:
response = requests.get(url, timeout = 2.5, proxies = proxys)
except requests.exceptions.Timeout:
print('Bad', proxies)
except requests.exceptions.ConnectionError:
print('Network problem', proxies)
else:
print('Good', line, 'Response time', response.elapsed)
# "with" closes the filehandle when done.
with open('goods.txt', 'a+') as files:
files.write('\n' + proxies)
if __name__ == '__main__':
queue = mp.queue()
p = multiprocessing.Process(target=check, args=(queue,))
p.start()
with open("SOCKS4.txt",'r') as file_handle:
for line in file_handle:
queue.put(line)
进程在单独的内存空间中运行,全局变量不在它们之间共享(每个进程都有自己在模块级别定义的任何变量的副本)。
def check(proxies):
proxys = {'http': kind + '://' + proxies, 'https': kind + '://' + proxies}
url = ('http://checkip.dyndns.com/')
try:
response = requests.get(url, timeout = 2.5, proxies = proxys)
except requests.exceptions.Timeout:
print('Bad', proxies)
except requests.exceptions.ConnectionError:
print('Network problem', proxies)
else:
print('Good', proxies, 'Response time', response.elapsed)
# "with" closes the filehandle when done.
with open('goods.txt', 'a+') as files:
files.write('\n' + proxies)
if __name__ == '__main__':
with open("SOCKS4.txt",'r') as file_handle: # "with" closes the filehandle when it is done
# iterates through each line of the file
for line in file_handle:
p = multiprocessing.Process(target=check, args=(line,)) # feed each line to the function
p.start()
def check(queue):
for line in iter(queue.get, 'STOP'):
proxys = {'http': kind + '://' + line, 'https': kind + '://' + line}
url = ('http://checkip.dyndns.com/')
try:
response = requests.get(url, timeout = 2.5, proxies = proxys)
except requests.exceptions.Timeout:
print('Bad', proxies)
except requests.exceptions.ConnectionError:
print('Network problem', proxies)
else:
print('Good', line, 'Response time', response.elapsed)
# "with" closes the filehandle when done.
with open('goods.txt', 'a+') as files:
files.write('\n' + proxies)
if __name__ == '__main__':
queue = mp.queue()
p = multiprocessing.Process(target=check, args=(queue,))
p.start()
with open("SOCKS4.txt",'r') as file_handle:
for line in file_handle:
queue.put(line)