如何迭代python列表，并在继续该过程之前停止加载下一个URL？_Python_List_Selenium_Loops_Web Scraping

如何迭代python列表，并在继续该过程之前停止加载下一个URL？

python list selenium loops web-scraping

如何迭代python列表，并在继续该过程之前停止加载下一个URL？,python,list,selenium,loops,web-scraping,Python,List,Selenium,Loops,Web Scraping,我已经学会了（使用python）构建一些不同的web scraper，目的是从我们的一个零件制造商的网站上刮取图像URL，以便批量上传产品负载表，其中一列包含图像URL 由于URL并不简单（我不能简单地遍历产品编号列表并将其附加到每个新的URL或任何更简单的方法；我在这里是因为我必须在这里），并且由于该站点没有“按产品编号搜索”功能，我转到了他们站点上的列表。他们有一些非常方便的工具！您可以按产品编号添加产品，完成后，您可以将该列表导出为.csv，并选择包含指向所有相应产品页面的链接。这太棒了，

我已经学会了（使用python）构建一些不同的web scraper，目的是从我们的一个零件制造商的网站上刮取图像URL，以便批量上传产品负载表，其中一列包含图像URL

由于URL并不简单（我不能简单地遍历产品编号列表并将其附加到每个新的URL或任何更简单的方法；我在这里是因为我必须在这里），并且由于该站点没有“按产品编号搜索”功能，我转到了他们站点上的列表。他们有一些非常方便的工具！您可以按产品编号添加产品，完成后，您可以将该列表导出为

.csv

，并选择包含指向所有相应产品页面的链接。这太棒了，直到我构建了我的脚本，并发现每个列表有250个项目的限制。从长远来看，我有不到5000个产品需要清理（这意味着我需要大约20个列表，其中19个已满，最后一个几乎已满）

我提到所有这些都是因为它与当前的代码和问题相关

现在我真的没有其他选择了，我的目标是使用我的代码并对其进行一些修改，以实现从20个单独的列表中删除。现在，在相关的阶段，它会得到一个URL，该URL指向他们网站的链接，以获取我命名的列表

testlist

，然后刷新页面以确保所有元素都处于有序状态

当我需要一个列表时，我们在正确的页面上，但有一个问题：我们不能再只使用一个链接，因为我们必须设置一些东西来迭代250个项目并创建大约20次新的列表（或者我可以手动创建列表并指向特定的URL）

我们手头的第二个问题是项目限制本身。我的for循环是一个很大的循环，旨在遍历我拥有的大约4800个产品编号的整个列表，将它们逐个添加到同一页面的列表中。我们需要将其分解为每页最多250个项目的块，并让它加载另一个列表URL。我可以手动创建这些列表，这样我就可以指向特定的URL，但是，如果添加一个只需单击并命名的函数更容易，那就太棒了。那部分我自己大概能猜出来

我不知道从这里到哪里去。我有一段代码，可以在一个URL上处理一个网站列表，遍历python列表中的产品编号，然后在最后导出

我需要我的脚本在同一个python列表中迭代，在250个产品编号之后停止，以加载下一个URL，然后继续该过程

我的代码中获取我们的列表URL，然后进入scraper部分的部分如下所示


# get the url for our list
listurl = 'https://www.thewebsiteimscraping.com/products/list-manager?listid=3925' # <- this is the URL for one particular list; other lists will have different list IDs
alert_accept()
driver.get(listurl)
alert_accept()

############################################################################################

driver.refresh()
# import our list, the Select function, the By function for selections, expected conditions, and our time function so we can sleep 
from kiberlist import mfrnumbers
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import time
# from testnumber import testnumbers as tlnum 


for number in mfrnumbers:
    
        # we find the listactions menu, and utilize the "add item" option
        WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#listActions")))
        alert_accept()
        print('Finding listactions...')
        select_am = Select(driver.find_element_by_css_selector('#listActions'))
        alert_accept()
        print("Found it. Selecting...")
        select_am.select_by_value('addItems')
        print('Selected. Next...')
        
        # paste our item number into the box paste it 
        print('Locating model number search....')
        inputidbox = driver.find_element_by_id('model-number-search')
        print('Located? Pasting model number...')
        inputidbox.send_keys(number)  
        
        # finally add our item
        additembutton = driver.find_element_by_css_selector('.gtmAddItemToList')
        print('Located add item button...')
        additembutton.click()   
        print('Item number added. Next...')
        print('Locating blank space...')
        WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#addItemsToListModal > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > button:nth-child(1) > svg:nth-child(1) > path:nth-child(1)")))
        time.sleep(1)
        xbutton = driver.find_element_by_css_selector('#addItemsToListModal > div:nth-child(1) > div:nth-child(1) > div:nth-child(1) > button:nth-child(1) > svg:nth-child(1) > path:nth-child(1)')
        xbutton.click()
        time.sleep(1)

 
        
# now we find the "export excel" option to get our csv for that list  
listactions = Select(driver.find_element_by_css_selector('#listActions'))
listactions.select_by_value('exportExcel')

# clicky clicky. user dialog will show up on screen asking if you want to save the file. user must manually click on save 
exportbutton = driver.find_element_by_css_selector('#btnExportToExcel')
exportbutton.click()


#获取我们列表的url
listurl=https://www.thewebsiteimscraping.com/products/list-manager?listid=3925“#看起来您的代码适用于单个列表，而现在您只希望它适用于该列表的较小部分
通常您会看到“将列表列表转换为一个平面列表”。这恰恰相反
我假设mfrnumbers是你的单一列表。我们将创建一个函数，给定一个扁平列表，它将返回list\u id
和该列表中的元素。正如你在问题中所说的，你将知道如何实际获得该列表。现在，我假设列表id是一个简单的整数
此函数get_list（mfrnumbers）
将以max_items\u/u list
的组返回这些数字。从技术上讲，它返回一个迭代器，您将对其进行迭代
def get_list(flattened_list, max_items_per_list=250):
    # maybe you have some pattern for list names?
    list_id = 1

    while len(flattened_list) > 0:
        current_list = flattened_list[:max_items_per_list]
        yield list_id, current_list

        flattened_list = flattened_list[len(current_list):]
        list_id += 1

我们可以如下调用此函数：
for (myid, mylist) in get_list([1,2,3,4,5], max_items_per_list=2):
    print (myid, mylist)

输出：
1 [1, 2]
2 [3, 4]
3 [5]

因此，在您的例子中，您可以将整个大循环作为内部循环运行，但输出为get\u list

for (myid, mylist) in get_list(mfrnumbers):
    # stop and do any loading for this new list...
    for number in mylist:
       .....

你在这个问题上下了很大的功夫，但是题目把它弄糟了！我建议更新这个，这样你的问题可以得到更多的关注。嗨！我编辑掉了这些绒毛，这样用户就可以在不分散注意力的情况下帮助您解决问题。@Ironkey谢谢！我知道这不是一个很清楚的问题，但我不确定该放什么…没问题！希望你能得到所需的帮助：）@haise0我的荣幸！