如何以多线程方式将selenium与python结合使用_Python_Selenium_Google Chrome_Selenium Webdriver

如何以多线程方式将selenium与python结合使用

python selenium google-chrome selenium-webdriver

如何以多线程方式将selenium与python结合使用,python,selenium,google-chrome,selenium-webdriver,Python,Selenium,Google Chrome,Selenium Webdriver,嘿，伙计们，我正在尝试使用线程与selenium一起工作。我的代码是：- import threading as th import time import base64 import mysql.connector as mysql import requests from bs4 import BeautifulSoup from seleniumwire import webdriver from selenium.webdriver.chrome.options import Opti

嘿，伙计们，我正在尝试使用线程与selenium一起工作。我的代码是：-

import threading  as th
import time
import base64
import mysql.connector as mysql
import requests
from bs4 import BeautifulSoup
from seleniumwire import webdriver
from selenium.webdriver.chrome.options import Options
from functions import *

options = Options()
prefs = {'profile.default_content_setting_values': {'images': 2,'popups': 2, 'geolocation': 2, 
                            'notifications': 2, 'auto_select_certificate': 2, 'fullscreen': 2, 
                            'mouselock': 2, 'mixed_script': 2, 'media_stream': 2, 
                            'media_stream_mic': 2, 'media_stream_camera': 2, 'protocol_handlers': 2, 
                            'ppapi_broker': 2, 'automatic_downloads': 2, 'midi_sysex': 2, 
                            'push_messaging': 2, 'ssl_cert_decisions': 2, 'metro_switch_to_desktop': 2, 
                            'protected_media_identifier': 2, 'app_banner': 2, 'site_engagement': 2, 
                            'durable_storage': 2}}
print('Crawling process started')
options.add_experimental_option('prefs', prefs)
driver = webdriver.Chrome(executable_path='chromedriver.exe', options=options)
driver.set_page_load_timeout(50000)
urls='https://google.com https://youtube.com'
def getinf(url_):
    driver.get(url_)
    soup=BeautifulSoup(driver.page_source, 'html5lib')
    print(soup.select('title'))
for url in urls.split():
    t=th.Thread(target=getinf, args=(url,))
    t.start()

当脚本运行时，选项卡不会像我预期的那样（从线程中）立即打开，而是一个接一个地完成该过程，并且只打印最后一个url（）的标题。当我尝试多重处理时，程序会崩溃很多次。我正在制作一个网络爬虫，一些网站（比如twitter）需要JavaScript来显示内容，所以我不能使用请求或urllib。解决这个问题的办法是什么。欢迎任何其他库建议。

尝试在线程代码中创建chromedriver。否则，您只有一个驱动程序，并且您正在更改同一驱动程序的url。相反，尝试为每个线程创建单独的chromedriver

注意：我没有尝试代码，只是建议

import threading  as th
import time
import base64
import mysql.connector as mysql
import requests
from bs4 import BeautifulSoup
from seleniumwire import webdriver
from selenium.webdriver.chrome.options import Options
from functions import *

options = Options()
prefs = {'profile.default_content_setting_values': {'images': 2,'popups': 2, 'geolocation': 2, 
                            'notifications': 2, 'auto_select_certificate': 2, 'fullscreen': 2, 
                            'mouselock': 2, 'mixed_script': 2, 'media_stream': 2, 
                            'media_stream_mic': 2, 'media_stream_camera': 2, 'protocol_handlers': 2, 
                            'ppapi_broker': 2, 'automatic_downloads': 2, 'midi_sysex': 2, 
                            'push_messaging': 2, 'ssl_cert_decisions': 2, 'metro_switch_to_desktop': 2, 
                            'protected_media_identifier': 2, 'app_banner': 2, 'site_engagement': 2, 
                            'durable_storage': 2}}
print('Crawling process started')
options.add_experimental_option('prefs', prefs)
urls='https://google.com https://youtube.com'
def getinf(url_):
    driver = webdriver.Chrome(executable_path='chromedriver.exe', options=options)
    driver.set_page_load_timeout(50000)
    driver.get(url_)
    soup=BeautifulSoup(driver.page_source, 'html5lib')
    print(soup.select('title'))
for url in urls.split():
    t=th.Thread(target=getinf, args=(url,))
    t.start()

Youtube和Twiiter都有Python API。selenium驱动程序是我不想单独为Youtube、twitter等开发一个软件来提取数据。我想要一个完整的。我该怎么做呢？如果必须是python，那就有Pypetteer，否则木偶演员是更好的选择。