Python 巨蟒靓汤找你都乱序_Python_Selenium_Beautifulsoup_Selenium Chromedriver_Web Crawler

Python 巨蟒靓汤找你都乱序

python selenium web-crawler

Python 巨蟒靓汤找你都乱序,python,selenium,beautifulsoup,selenium-chromedriver,web-crawler,Python,Selenium,Beautifulsoup,Selenium Chromedriver,Web Crawler,我有一个网页刮板，可以贴在我的墙纸上。当我在测试配置文件上尝试它时，它工作得非常好。但现在，当我在我的实际帐户上尝试它时，有大约250个帖子，它倾向于扰乱它们。事实上，我并没有在表演中捕捉到它，所以我不知道它发生在什么地方刮刀使用soup.find_all（）查找所有帖子 all_posts = soup.find_all("a", {"data-testid": "product__item"}) #{"class&q

我有一个网页刮板，可以贴在我的墙纸上。当我在测试配置文件上尝试它时，它工作得非常好。但现在，当我在我的实际帐户上尝试它时，有大约250个帖子，它倾向于扰乱它们。事实上，我并没有在表演中捕捉到它，所以我不知道它发生在什么地方

刮刀使用soup.find_all（）查找所有帖子

all_posts = soup.find_all("a", {"data-testid": "product__item"})   #{"class": "styles__ProductImage-sc-5cfswk-5 gPcWvA LazyLoadImage__Image-sc-1732jps-1 cSwkPp"})

find_是否按照网站上显示的顺序返回所有元素，或者返回的顺序是否不可靠（如SQL查询）

def GetPosts(driver):
    #scroll to the bottom of the page
    
    for i in range(20):
        time.sleep(0.25)
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    
    soup = BeautifulSoup(driver.page_source, 'html')
    all_posts = soup.find_all("a", {"data-testid": "product__item"})   #{"class": "styles__ProductImage-sc-5cfswk-5 gPcWvA LazyLoadImage__Image-sc-1732jps-1 cSwkPp"})
    
    posts = FilterOutSoldItems(all_posts)

    postNum = 0
    posts.reverse()
    for post in posts:  
        time.sleep(1.25)
        print("posts: " + str(postNum))   
        href = post.get('href')
        product_page = href.split("/")[-2] #get the product page identifier
        edit_page = "https://www.depop.com/products/edit/" + product_page + "/"

        time.sleep(1)      
        driver.get(edit_page)
        time.sleep(1.5)
        try:      
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")          
            save_changes_button = driver.find_element_by_xpath("//button[@data-testid='editProductFormButtons__save']")
            time.sleep(1)
            save_changes_button.click() 
                   
        except:
            print("post has already been sold")
        
        print(href + " - has been updated at {}".format(datetime.now()))
        postNum = 1 + postNum