Python 多个不同领域的屏幕截图同时出现

Python 多个不同领域的屏幕截图同时出现,python,python-multithreading,Python,Python Multithreading,我目前正在编写一个代码,允许用户使用多线程同时拍摄不同网页的多个屏幕截图 守则: import selenium import threading import time, datetime from datetime import date, timedelta from selenium import webdriver domain_file = r'C:\Users\a\testfiles\testdomains.txt' driver = webdriver.PhantomJS()

我目前正在编写一个代码,允许用户使用多线程同时拍摄不同网页的多个屏幕截图

守则:

import selenium
import threading
import time, datetime
from datetime import date, timedelta
from selenium import webdriver

domain_file = r'C:\Users\a\testfiles\testdomains.txt'
driver = webdriver.PhantomJS()
def file_len(file):
    with open(file, 'r') as f:
        for i, l in enumerate(f):
            pass
        return i + 1

current_date = date.today().strftime('%Y-%m-%d_')

def threadedloop(d):
    with open(domain_file, 'r') as f:
        for line in f:

            stripped_line = line.rstrip()
            url1 = 'http://' + stripped_line
            url2 = 'https://' + stripped_line
            imgname = current_date + 'http_' + stripped_line + '.png'
            imgSname = current_date + 'https_' + stripped_line + '.png'

            ### Screenshot function ###

            def scrshot():

                print('Taking screenshot of {}.'.format(stripped_line))

                try:
                    driver.get(url1)
                except TimeoutException:
                    print('{} timed out'.format(url1))
                    pass
                except Exception:
                    print('Unknown error at {}'.format(stripped_line))

                driver.maximize_window()
                driver.save_screenshot(imgname)
                try:
                    driver.get(url2)
                except TimeoutException:
                    print('{} timed out'.format(url2))
                    pass
                except Exception:
                    print('Unknown error at {}'.format(stripped_line))

                driver.maximize_window()
                driver.save_screenshot(imgSname)

            scrshot()

d = threading.local

start = time.time()

for i in range(file_len(domain_file)):
    t = threading.Thread(target = threadedloop, args=(d,))
    t.start()

t.join()

end = time.time()

print(end - start)
测试文件由4个域组成。 问题是网页不是每个添加到1个单一线程,而是每个添加到所有4个线程,导致输出:

Taking screenshot of google.com.
Taking screenshot of google.com.
Taking screenshot of google.com.
Taking screenshot of google.com.
Taking screenshot of reddit.com.
Taking screenshot of reddit.com.
Taking screenshot of reddit.com.
Taking screenshot of reddit.com.
Taking screenshot of facebook.com.
Taking screenshot of facebook.com.
Taking screenshot of facebook.com.
Taking screenshot of facebook.com.
Taking screenshot of facebook.com.
Taking screenshot of twitter.com.
Taking screenshot of twitter.com.
Taking screenshot of twitter.com.
Taking screenshot of twitter.com.

非常感谢您的帮助。

我查看了您的代码,发现您没有正确划分子任务

def threadedloop(d):
    with open(domain_file, 'r') as f:
       for line in f:
函数中的这两行读取每一行作为函数“threadlocal”的输入。 这意味着,每次调用该函数时,读取并处理每个url

接下来,在多线程部分

for i in range(file_len(domain_file)):
    t = threading.Thread(target = threadedloop, args=(d,))
    t.start()
每一行都被再次读取并分配给线程,线程调用函数threadedloop。 我想你已经看到问题了


更好的方法是在创建线程之前只执行url分发部分(就像在代码中执行第二个位置的方法一样)。您可以使用args参数将url传递给函数,args参数用于传递threading.local。

您可以分享一下如何做到这一点吗?我试过你说的,但我想我还是漏掉了一些东西。我更改了代码,所以我没有调用threadedloop(d),而是在循环中调用scrshot(d),因此URL是在线程之前定义的。现在只读取文件的最后一个元素4次,尽管是同时读取。