Python 文件中的多处理读取行_Python_Python 3.x_Python Requests_Multiprocessing

Python 文件中的多处理读取行

python python-3.x

Python 文件中的多处理读取行,python,python-3.x,python-requests,multiprocessing,Python,Python 3.x,Python Requests,Multiprocessing,所以我有一个程序可以检查列表中的代理，但是它非常慢，所以我添加了多处理。但我的问题是，当我运行程序时，它只读取文本文件中的第一行，但当我运行代码而不进行多重处理时，它会向下读取文件中的行。IDK我认为这与{proxies=file.readline（）}有关我认为python可能不擅长跨线程多路复用文件对象。我简化并更改了您的代码，似乎效果更好： import multiprocessing file = open("test.txt",'r') def check(

所以我有一个程序可以检查列表中的代理，但是它非常慢，所以我添加了多处理。但我的问题是，当我运行程序时，它只读取文本文件中的第一行，但当我运行代码而不进行多重处理时，它会向下读取文件中的行。IDK我认为这与

{proxies=file.readline（）}

有关

我认为python可能不擅长跨线程多路复用文件对象。我简化并更改了您的代码，似乎效果更好：

import multiprocessing

file = open("test.txt",'r')


def check(proxies):
    print(proxies)

if __name__ == '__main__':
   while True:
       proxies = file.readline()
       p = multiprocessing.Process(target=check, args=(proxies,))
       p.start()

其中test.txt是我制作的一个示例文件：

test
asdf
1
2
3
4

此代码似乎正确地处理了文件的所有行（尽管顺序不正确）：

您仍然需要一种方法来停止循环，而我在这段代码中没有这样做

在我的版本中，我以串行方式读取文件，但仍然以多线程方式处理文件。我读取循环外部的文件，并将结果行作为参数传递给线程。这可能没有你想要的那么快，但我不知道如何做得更快。它应该仍然非常快，因为（当您集成我的更改时）它不会在启动另一个请求之前等待响应。

因为发出http网络请求是一个I/O绑定操作，您可能应该使用，因为后者对CPU绑定操作更友好，而您现在不这么做

由于多个进程彼此独立（除非您使用队列、共享内存位置或文件），因此每个进程都会获得一个文件句柄，并在没有任何意识的情况下读取第一行

将函数更改为获取行条目，这样每个进程可以获取一个文件名：

def check(proxies):
    proxys = {'http': kind + '://' + proxies, 'https': kind + '://' + proxies}
    url = ('http://checkip.dyndns.com/')
    try:
        response = requests.get(url, timeout = 2.5, proxies = proxys)
    except requests.exceptions.Timeout:
        print('Bad', proxies)
    except requests.exceptions.ConnectionError:
        print('Network problem', proxies)
    else:
        print('Good', proxies, 'Response time', response.elapsed)
        # "with" closes the filehandle when done.
        with open('goods.txt', 'a+') as files:
            files.write('\n' + proxies)


if __name__ == '__main__':
   with open("SOCKS4.txt",'r') as file_handle: # "with" closes the filehandle when it is done
       # iterates through each line of the file
       for line in file_handle:
           p = multiprocessing.Process(target=check, args=(line,)) # feed each line to the function
           p.start()

我建议将文件中的所有行放入队列，让子进程从队列中选择行。它类似于@salparadise的解决方案，但您只生成一次新流程。如下所示：

def check(queue):
    for line in iter(queue.get, 'STOP'):
        proxys = {'http': kind + '://' + line, 'https': kind + '://' + line}
        url = ('http://checkip.dyndns.com/')
        try:
            response = requests.get(url, timeout = 2.5, proxies = proxys)
        except requests.exceptions.Timeout:
            print('Bad', proxies)
        except requests.exceptions.ConnectionError:
            print('Network problem', proxies)
        else: 
            print('Good', line, 'Response time', response.elapsed)
            # "with" closes the filehandle when done.
            with open('goods.txt', 'a+') as files:
                files.write('\n' + proxies)


if __name__ == '__main__':
   queue = mp.queue()
   p = multiprocessing.Process(target=check, args=(queue,)) 
   p.start()
   with open("SOCKS4.txt",'r') as file_handle: 
       for line in file_handle:
           queue.put(line)

进程在单独的内存空间中运行，全局变量不在它们之间共享（每个进程都有自己在模块级别定义的任何变量的副本）。

def check(proxies):
    proxys = {'http': kind + '://' + proxies, 'https': kind + '://' + proxies}
    url = ('http://checkip.dyndns.com/')
    try:
        response = requests.get(url, timeout = 2.5, proxies = proxys)
    except requests.exceptions.Timeout:
        print('Bad', proxies)
    except requests.exceptions.ConnectionError:
        print('Network problem', proxies)
    else:
        print('Good', proxies, 'Response time', response.elapsed)
        # "with" closes the filehandle when done.
        with open('goods.txt', 'a+') as files:
            files.write('\n' + proxies)


if __name__ == '__main__':
   with open("SOCKS4.txt",'r') as file_handle: # "with" closes the filehandle when it is done
       # iterates through each line of the file
       for line in file_handle:
           p = multiprocessing.Process(target=check, args=(line,)) # feed each line to the function
           p.start()

def check(queue):
    for line in iter(queue.get, 'STOP'):
        proxys = {'http': kind + '://' + line, 'https': kind + '://' + line}
        url = ('http://checkip.dyndns.com/')
        try:
            response = requests.get(url, timeout = 2.5, proxies = proxys)
        except requests.exceptions.Timeout:
            print('Bad', proxies)
        except requests.exceptions.ConnectionError:
            print('Network problem', proxies)
        else: 
            print('Good', line, 'Response time', response.elapsed)
            # "with" closes the filehandle when done.
            with open('goods.txt', 'a+') as files:
                files.write('\n' + proxies)


if __name__ == '__main__':
   queue = mp.queue()
   p = multiprocessing.Process(target=check, args=(queue,)) 
   p.start()
   with open("SOCKS4.txt",'r') as file_handle: 
       for line in file_handle:
           queue.put(line)