Python 文件中的多处理读取行

Python 文件中的多处理读取行,python,python-3.x,python-requests,multiprocessing,Python,Python 3.x,Python Requests,Multiprocessing,所以我有一个程序可以检查列表中的代理,但是它非常慢,所以我添加了多处理。但我的问题是,当我运行程序时,它只读取文本文件中的第一行,但当我运行代码而不进行多重处理时,它会向下读取文件中的行。IDK我认为这与{proxies=file.readline()}有关 我认为python可能不擅长跨线程多路复用文件对象。我简化并更改了您的代码,似乎效果更好: import multiprocessing file = open("test.txt",'r') def check(

所以我有一个程序可以检查列表中的代理,但是它非常慢,所以我添加了多处理。但我的问题是,当我运行程序时,它只读取文本文件中的第一行,但当我运行代码而不进行多重处理时,它会向下读取文件中的行。IDK我认为这与
{proxies=file.readline()}
有关


我认为python可能不擅长跨线程多路复用文件对象。我简化并更改了您的代码,似乎效果更好:

import multiprocessing

file = open("test.txt",'r')


def check(proxies):
    print(proxies)

if __name__ == '__main__':
   while True:
       proxies = file.readline()
       p = multiprocessing.Process(target=check, args=(proxies,))
       p.start()
其中test.txt是我制作的一个示例文件:

test
asdf
1
2
3
4
此代码似乎正确地处理了文件的所有行(尽管顺序不正确):

您仍然需要一种方法来停止循环,而我在这段代码中没有这样做

在我的版本中,我以串行方式读取文件,但仍然以多线程方式处理文件。我读取循环外部的文件,并将结果行作为参数传递给线程。这可能没有你想要的那么快,但我不知道如何做得更快。它应该仍然非常快,因为(当您集成我的更改时)它不会在启动另一个请求之前等待响应。

因为发出http网络请求是一个I/O绑定操作,您可能应该使用,因为后者对CPU绑定操作更友好,而您现在不这么做

由于多个进程彼此独立(除非您使用队列、共享内存位置或文件),因此每个进程都会获得一个文件句柄,并在没有任何意识的情况下读取第一行

将函数更改为获取行条目,这样每个进程可以获取一个文件名:

def check(proxies):
    proxys = {'http': kind + '://' + proxies, 'https': kind + '://' + proxies}
    url = ('http://checkip.dyndns.com/')
    try:
        response = requests.get(url, timeout = 2.5, proxies = proxys)
    except requests.exceptions.Timeout:
        print('Bad', proxies)
    except requests.exceptions.ConnectionError:
        print('Network problem', proxies)
    else:
        print('Good', proxies, 'Response time', response.elapsed)
        # "with" closes the filehandle when done.
        with open('goods.txt', 'a+') as files:
            files.write('\n' + proxies)


if __name__ == '__main__':
   with open("SOCKS4.txt",'r') as file_handle: # "with" closes the filehandle when it is done
       # iterates through each line of the file
       for line in file_handle:
           p = multiprocessing.Process(target=check, args=(line,)) # feed each line to the function
           p.start()


我建议将文件中的所有行放入队列,让子进程从队列中选择行。它类似于@salparadise的解决方案,但您只生成一次新流程。如下所示:

def check(queue):
    for line in iter(queue.get, 'STOP'):
        proxys = {'http': kind + '://' + line, 'https': kind + '://' + line}
        url = ('http://checkip.dyndns.com/')
        try:
            response = requests.get(url, timeout = 2.5, proxies = proxys)
        except requests.exceptions.Timeout:
            print('Bad', proxies)
        except requests.exceptions.ConnectionError:
            print('Network problem', proxies)
        else: 
            print('Good', line, 'Response time', response.elapsed)
            # "with" closes the filehandle when done.
            with open('goods.txt', 'a+') as files:
                files.write('\n' + proxies)


if __name__ == '__main__':
   queue = mp.queue()
   p = multiprocessing.Process(target=check, args=(queue,)) 
   p.start()
   with open("SOCKS4.txt",'r') as file_handle: 
       for line in file_handle:
           queue.put(line)

进程在单独的内存空间中运行,全局变量不在它们之间共享(每个进程都有自己在模块级别定义的任何变量的副本)。
def check(proxies):
    proxys = {'http': kind + '://' + proxies, 'https': kind + '://' + proxies}
    url = ('http://checkip.dyndns.com/')
    try:
        response = requests.get(url, timeout = 2.5, proxies = proxys)
    except requests.exceptions.Timeout:
        print('Bad', proxies)
    except requests.exceptions.ConnectionError:
        print('Network problem', proxies)
    else:
        print('Good', proxies, 'Response time', response.elapsed)
        # "with" closes the filehandle when done.
        with open('goods.txt', 'a+') as files:
            files.write('\n' + proxies)


if __name__ == '__main__':
   with open("SOCKS4.txt",'r') as file_handle: # "with" closes the filehandle when it is done
       # iterates through each line of the file
       for line in file_handle:
           p = multiprocessing.Process(target=check, args=(line,)) # feed each line to the function
           p.start()

def check(queue):
    for line in iter(queue.get, 'STOP'):
        proxys = {'http': kind + '://' + line, 'https': kind + '://' + line}
        url = ('http://checkip.dyndns.com/')
        try:
            response = requests.get(url, timeout = 2.5, proxies = proxys)
        except requests.exceptions.Timeout:
            print('Bad', proxies)
        except requests.exceptions.ConnectionError:
            print('Network problem', proxies)
        else: 
            print('Good', line, 'Response time', response.elapsed)
            # "with" closes the filehandle when done.
            with open('goods.txt', 'a+') as files:
                files.write('\n' + proxies)


if __name__ == '__main__':
   queue = mp.queue()
   p = multiprocessing.Process(target=check, args=(queue,)) 
   p.start()
   with open("SOCKS4.txt",'r') as file_handle: 
       for line in file_handle:
           queue.put(line)