Python for循环跳过迭代_Python_Selenium_For Loop_Bots_Skip

Python for循环跳过迭代

python selenium for-loop bots

Python for循环跳过迭代,python,selenium,for-loop,bots,skip,Python,Selenium,For Loop,Bots,Skip,因此，我制作了一个selenium机器人，它可以遍历区域代码列表，并将这些代码发送到网站的搜索框中，然后将代码更改为城市名称，然后我将其刮取，以获得城市列表，而不是代码列表。问题是，当我的for循环在列表中迭代时，有时它会“跳过”给定的命令并直接进入下一次迭代，因此我没有收到完整的城市列表。列表中的一些代码不存在或不适合传递到网站，因此我对这种情况做了例外 import time import pandas from selenium import webdriver from selenium

因此，我制作了一个selenium机器人，它可以遍历区域代码列表，并将这些代码发送到网站的搜索框中，然后将代码更改为城市名称，然后我将其刮取，以获得城市列表，而不是代码列表。问题是，当我的for循环在列表中迭代时，有时它会“跳过”给定的命令并直接进入下一次迭代，因此我没有收到完整的城市列表。列表中的一些代码不存在或不适合传递到网站，因此我对这种情况做了例外

import time
import pandas
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

chrome_driver_path = "D:\Development\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)
driver.get("https://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/wyszukiwanie/wyszukiwanie.aspx?contrast=default")

# Get the column with the codes from excel sheet and redo it into the list.
data = pandas.read_excel(r"D:\NFZ\FINAL Baza2WUJEK(poprawione1)-plusostatniepoprawki.xlsx")
codes = data["Kody terytorialne"].tolist()


cities = []


iteration = 0

for code in codes:
    time.sleep(0.05)
    iteration += 1
    print(iteration)
    if code == "Absence":
        cities.append("Absence")
    elif code == "Error":
        cities.append("Error")
    elif code == 2211041 or code == 2211021:
        cities.append("Manual")
    else:
        # Send territorial code
        driver.find_element_by_xpath('//*[@id="body_TabContainer1_TabPanel1_TBJPTIdentyfikator"]').clear()
        driver.find_element_by_xpath('//*[@id="body_TabContainer1_TabPanel1_TBJPTIdentyfikator"]').send_keys(code)
        # Search
        try:
            button = WebDriverWait(driver, 20).until(
                EC.presence_of_element_located((By.XPATH,
                                                '/html/body/form/section/div/div[2]/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/div[1]/div/input')))
            button.click()
        except:
            button = WebDriverWait(driver, 20).until(
                EC.presence_of_element_located((By.XPATH,
                                                '/html/body/form/section/div/div[2]/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/div[1]/div/input')))
            button.click()
        # Scrape city name
        city = WebDriverWait(driver, 20).until(
            EC.presence_of_element_located((By.XPATH, '//*[@id="body_TabContainer1_TabPanel1_GVTERC"]/tbody/tr[2]/td[1]/strong'))).text.split()
        print(code)
        print(city)
        cities.append(city)


table = {
    "Cities": cities
}

df = pandas.DataFrame.from_dict(table)
df.to_excel("cities-FINAL.xlsx")
driver.close()

这是我的控制台日志的一部分。如您所见，在指示迭代编号为98后，它会跳到99，在那里它可以完全正常工作，打印城市和地区代码。这个问题会在循环中进一步出现，但每次它都是在第98次迭代中开始的。与此相关的地区法规不属于例外

96 <-- Iteration
2201025 <-- Territorial Code
['Kędzierzyn-Koźle', '(2201025)'] <-- City Name
97
2262011
['Bytów', '(2262011)']
98 !<-- Just iteration!
99
2205084
['Gdynia', '(2208011)']

**!Quick Note due to the answers! Here is the order of the print statements in the console. First: number of the iteration, Second: Territorial Code related to the iteration, Third: City Name**

96这里有几个问题：
你的定位器太糟糕了
我看你的结果不正确。例如，对于“2262011”输入，当您为输入“2205084”呈现此输出时，输出为“Gdynia（2262011）”
您的except代码与try代码类似。这没有道理。如果这在try块中不起作用，为什么您认为这在第二次尝试时会起作用而不做任何更改
最好是等待元素的可见性而不是存在，因为在元素刚刚呈现的那一刻，它还没有完全准备好被点击等等
最好将元素定位器至少放在类的顶部，而不是在代码中硬编码
我试图让你的代码更好一点。

请试一试
import time
import pandas
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

chrome_driver_path = "D:\Development\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)
driver.get("https://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/wyszukiwanie/wyszukiwanie.aspx?contrast=default")

# Get the column with the codes from excel sheet and redo it into the list.
data = pandas.read_excel(r"D:\NFZ\FINAL Baza2WUJEK(poprawione1)-plusostatniepoprawki.xlsx")
codes = data["Kody terytorialne"].tolist()

code_input_xpath = 'body_TabContainer1_TabPanel1_TBJPTIdentyfikator'
search_button_xpath = '//input[@id="body_TabContainer1_TabPanel1_BJPTWyszukaj"]'
city_xpath = '//table[@id="body_TabContainer1_TabPanel1_GVTERC"]//td/strong'



cities = []


iteration = 0

for code in codes:
    time.sleep(0.1)
    iteration += 1
    print(iteration)
    if code == "Absence":
        cities.append("Absence")
    elif code == "Error":
        cities.append("Error")
    elif code == 2211041 or code == 2211021:
        cities.append("Manual")
    else:
        # Send territorial code
        driver.find_element_by_xpath(code_input_xpath).clear()
        driver.find_element_by_xpath(code_input_xpath).send_keys(code)
        # Search
        button = WebDriverWait(driver, 20).until(
                EC.visibility_of_element_located((By.XPATH,search_button_xpath)))
            button.click()        
        # Scrape city name
        time.sleep(2)
        city = WebDriverWait(driver, 20).until(
            EC.visibility_of_element_located((By.XPATH, city_xpath))).text.split()
        print(code)
        print(city)
        cities.append(city)


table = {
    "Cities": cities
}

df = pandas.DataFrame.from_dict(table)
df.to_excel("cities-FINAL.xlsx")
driver.close()

code[97]
的值是多少？是的，必须触发顶部的一个code
检查，这会导致跳过其余的打印，除非此代码中缺少其他内容。为什么在“try”和“except”中有相同的代码？？？如果代码在“try”时失败，它将在“except…”时再次下降。。。也许它在迭代98中失败了。不过，如果它第二次失败，应该会导致一个未捕获的异常；除非这个代码实际上被包装在一个中，否则尝试一下，这就是吞咽错误。因此代码[97]的值是2262011。我在try中有相同的代码，只是因为有时候selenium由于“stale element exception”而失败，无法按下搜索按钮，所以如果失败，那么它会再次尝试，所以技术上讲，如果出现故障，selenium找不到按钮，它会引发错误并停止循环。嘿，感谢您的回答。我知道有些结果是不正确的，我不得不将睡眠时间从4秒降低到0.05秒，因此它会造成整个“与城市不正确相关的代码”混乱，因为我想更快地测试代码，因为即使睡眠时间增加，代码也会以同样的方式失败。我以这种奇怪的方式实现了try，except方法，因为selenium有时会引发“无过时元素异常”，我已经读到使用这种符号运行两次引发问题的代码可以解决问题。非常感谢您的建议。你必须原谅我代码中的混乱，我还是一个新手，但这肯定会帮助我提高。当然，好的，但是你试过使用这个代码吗？好的，我明白了。当代码发现其中一个异常时，它只是追加并只打印迭代编号，所以存在跳过的错觉。但这仍然不能解释为什么我第一手得到了这份未完成的清单。我会测试你的代码并告诉你是否有用。不，不，你搞错了。迭代编号总是被打印出来，因为它是循环中的第一行代码，然后我们引发异常，如果代码捕捉到其中一个，它只会将异常名称附加到城市列表中，但在我们的else语句中，我们可以解释为“没有错”语句不仅将城市名称附加到列表中，而且还打印城市名称。