在python中如何在多线程中执行单个函数和在循环中创建线程实例？_Python_Multithreading

在python中如何在多线程中执行单个函数和在循环中创建线程实例？

python multithreading

在python中如何在多线程中执行单个函数和在循环中创建线程实例？,python,multithreading,Python,Multithreading,1）我有一个产品链接列表，其中包含3385个链接 2）我有一个函数get_pro_info（link），它获取产品的链接并将项目附加到json文件中 3）我想要selenium打开5个浏览器和5个并行链接，获取产品信息并附加到文件或列表中或者3）selenium打开1个浏览器和5个选项卡（有5个链接）并附加文件问题：如何在代码上应用线程我的代码 new_url='' def get_pro_info(pro_url): driver = webdriver.Chrome(ex

1）我有一个产品链接列表，其中包含3385个链接

2）我有一个函数get_pro_info（link），它获取产品的链接并将项目附加到json文件中

3）我想要selenium打开5个浏览器和5个并行链接，获取产品信息并附加到文件或列表中

或者3）selenium打开1个浏览器和5个选项卡（有5个链接）并附加文件

问题：如何在代码上应用线程

我的代码

new_url=''
def get_pro_info(pro_url):
    driver = webdriver.Chrome(executable_path=r'C:\Users\Beenu\PycharmProjects/chromedriver.exe')
    try:
        new_url = 'https://pk.studiobytcs.com' + pro_url
        print('new product URL: ' + new_url)
        driver.execute_script("window.open('');")
        sleep(1)
        # use to switch control
        driver.switch_to.window(driver.window_handles[0])
        # sleep(1)
        driver.get(new_url)
    except(WebDriverException, selenium.common.exceptions.TimeoutException, Exception) as e:
        print('There is error in getting Product by URL in get_pro_info()! \n' + str(e.stacktrace))
        pass
    description_source_code = ''
    # description_soup = BeautifulSoup()
    description_soup: BeautifulSoup = object
    # global description_soup
    try:
        # description_soup = BeautifulSoup('html.parser')
        description: WebElement = driver.find_element_by_xpath(
            '//*[@id="shopify-section-product-template"]/div[2]/div[1]/div/div[2]')
        description_source_code = description.get_attribute("innerHTML")
        description_soup: BeautifulSoup = BeautifulSoup(description_source_code, 'html.parser')
    except NoSuchElementException as e:
        print('Product description taag not found! \n' + str(e.stacktrace))
        pass
    # 179 here
    # This is for getting heading product name
    head = ''
    r_j_title = ''
try:
    head = description_soup.find_all("h1", class_="product_name")
    # print(head)
    r_j_title = head[0].string.strip()
    print("Title: " + r_j_title)
except (HTMLParser, IndexError):
    print('Fail to get heading/title Tag! \n' + str(HTMLParser))

# This is for get brand name from heading/title
r_j_brand_and_designer = ''
try:
    brand_and_designer = head[0].string.strip().split("-")[0]
    r_j_brand_and_designer = str(brand_and_designer).strip()
    print('Brand and designer: ' + r_j_brand_and_designer)
except (IndexError, ValueError) as e:
    print('Fail to Split Brand from heading/title ! \n' + str(e.stacktrace))

# This is for getting price in integer
r_j_price_in_int = ''
try:
    price = description_soup.find_all("span", class_="money")
    # print(price)
    price_new = price[0].string.strip()
    print("New price: " + price_new)
    # this is for getting price from string
    r_c_price = price[0].string.strip().split(".")[1]
    r_j_price_in_int = str(r_c_price).replace(",", "")
    # price could ha ,
    print('Price: ' + r_j_price_in_int)
except (HTMLParser, IndexError, ValueError) as e:
    print('Fail to get Tag or failed to Split Brand from heading/title ! \n' + str(e.stacktrace))

# this is for getting full description
description_all = ''
r_j_desc = ''
try:
    description_all = description_soup.find_all("div", class_="description")
    final_des = str(description_all[0].get_text())
    ch = final_des.split()
    r_j_desc = str(' '.join(ch))
    print("with split ch : " + r_j_desc)  # addtion of .string.strip()
except (HTMLParser, IndexError, ValueError) as e:
    print('Fail to get all description Tag or failed to Split and removing endline chr from description ! \n' + str(
        e.stacktrace))

#   This is for trying if fibric tag is not avaliable
try:
    get_split_fibric = description_all[0].get_text().split("Fabric", 1)[1]
    get_split_des = get_split_fibric.split("Disclaimer")[0]
    r_j_fabric = str(get_split_des).strip()
    print("getting fibric: " + r_j_fabric)
except IndexError as e:
    r_j_fabric = 'N/A'
    print('Fabric is not avaliable: ' + r_j_fabric)

item['brand_name'] = str(r_j_brand_and_designer)
item['designer'] = str(r_j_brand_and_designer)
item['title'] = str(r_j_title)
item['description'] = str(r_j_desc)
item['price'] = int(r_j_price_in_int)
item['currency'] = "PKR"
item['product_id'] = str(r_j_title)
item['source'] = str(new_url)
item['fabric'] = str(r_j_fabric)
item['gender'] = "woman"

print(item)
cloth = {
    "cloth": item
}
# instruction
print(cloth)
list_before_dump.append(cloth)

driver.close()
driver.quit()

with open('product_link_read.txt', 'r') as file:

data = file.readlines()
# rd_pro_link_list=rd_pro_link_list+data.replace('\n', '')

print(data)
for line in data:
    # fap=
    rd_pro_link_list.append(str(line).strip())

print(rd_pro_link_list)
print(len(rd_pro_link_list))

for pro_link in rd_pro_link_list:
get_pro_info(pro_link)
print('Pro count = ' + str(pro_count))
pro_count = pro_count + 1

list_before_dump_file.write(json.dumps(list_before_dump))
driver.close()
list_before_dump_file.close()

如果您希望迭代列表并始终获得20个链接，则可以使用

范围（开始、停止、步骤）

和

步骤=20

all_t = []

for i in range(0, len(list_of_product_link), 20):
     twenty_links = list_of_product_link[i:i+20]

     t = threading.Thread(target=get_product_info, args=(twenty_links,))
     t.start()
     all_t.append(t)

# --- later ---

for t in all_t:
     t.join()

或

其他方法是好的，如果你以后不需要你的列表

all_t = []

while list_of_product_link:
     twenty_links = list_of_product_link[:20]
     list_of_product_link = list_of_product_link[20:]

     t = threading.Thread(target=get_product_info, args=(twenty_links,))
     t.start()
     all_t.append(t)

# --- later ---

for t in all_t:
     t.join()

或

顺便说一句：

args=

需要元组-即使您只有一个参数，所以您也需要

，（）
中的来创建包含一个元素的元组

顺便说一句：如果你想让它每时每刻只运行20个线程，那么最好查看和Pool（20）

前20个链接<代码>产品列表链接[：20]
接下来的20个链接会自动迭代接下来的20个链接吗？接下来的20个链接<代码>产品列表链接[20:40]

我想迭代整个列表，其中包含大约3000个链接。。如何使用loop实现这一点请参阅我的loop答案谢谢我正在尝试，如果出现错误，我会通知您。.t=threading.Thread（target=get\u product\u info，args=（二十个链接，）在这里，你给出了函数的列表，但我想给每个线程一个链接，一次创建20个线程？你有没有其他的源代码，我可以向你展示完整的代码？在这里，我不能放更多的字符？我不知道为什么你需要20个链接-有些人发送20个链接到一个线程，所以他们不必一次又一次地创建线程。但您可以使用循环运行20个线程。请参见答案中的新代码-现在有两个版本，分别为

args=（二十个链接）

和

args=（链接）

。

all_t = []

while list_of_product_link:
     twenty_links = list_of_product_link[:20]
     list_of_product_link = list_of_product_link[20:]

     t = threading.Thread(target=get_product_info, args=(twenty_links,))
     t.start()
     all_t.append(t)

# --- later ---

for t in all_t:
     t.join()

while list_of_product_link:
     twenty_links = list_of_product_link[:20]
     list_of_product_link = list_of_product_link[20:]

      all_t = []

     for link in twenty_links:    
          t = threading.Thread(target=get_product_info, args=(link,))
          t.start()
          all_t.append(t)

     # --- inside first `for` loop ---

     for t in all_t:
         t.join()

 from multiprocessing import Pool

def get_product_info(link):
    result = ....
    return result

if __name__ == '__main__':
    with Pool(20) as p:
        all_results = p.map(get_product_info, list_of_product_link)