Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 巨蟒靓汤找你都乱序_Python_Selenium_Beautifulsoup_Selenium Chromedriver_Web Crawler - Fatal编程技术网

Python 巨蟒靓汤找你都乱序

Python 巨蟒靓汤找你都乱序,python,selenium,beautifulsoup,selenium-chromedriver,web-crawler,Python,Selenium,Beautifulsoup,Selenium Chromedriver,Web Crawler,我有一个网页刮板,可以贴在我的墙纸上。当我在测试配置文件上尝试它时,它工作得非常好。但现在,当我在我的实际帐户上尝试它时,有大约250个帖子,它倾向于扰乱它们。事实上,我并没有在表演中捕捉到它,所以我不知道它发生在什么地方 刮刀使用soup.find_all()查找所有帖子 all_posts = soup.find_all("a", {"data-testid": "product__item"}) #{"class&q

我有一个网页刮板,可以贴在我的墙纸上。当我在测试配置文件上尝试它时,它工作得非常好。但现在,当我在我的实际帐户上尝试它时,有大约250个帖子,它倾向于扰乱它们。事实上,我并没有在表演中捕捉到它,所以我不知道它发生在什么地方

刮刀使用soup.find_all()查找所有帖子

all_posts = soup.find_all("a", {"data-testid": "product__item"})   #{"class": "styles__ProductImage-sc-5cfswk-5 gPcWvA LazyLoadImage__Image-sc-1732jps-1 cSwkPp"})
find_是否按照网站上显示的顺序返回所有元素,或者返回的顺序是否不可靠(如SQL查询)

def GetPosts(driver):
    #scroll to the bottom of the page
    
    for i in range(20):
        time.sleep(0.25)
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    
    soup = BeautifulSoup(driver.page_source, 'html')
    all_posts = soup.find_all("a", {"data-testid": "product__item"})   #{"class": "styles__ProductImage-sc-5cfswk-5 gPcWvA LazyLoadImage__Image-sc-1732jps-1 cSwkPp"})
    
    posts = FilterOutSoldItems(all_posts)

    postNum = 0
    posts.reverse()
    for post in posts:  
        time.sleep(1.25)
        print("posts: " + str(postNum))   
        href = post.get('href')
        product_page = href.split("/")[-2] #get the product page identifier
        edit_page = "https://www.depop.com/products/edit/" + product_page + "/"

        time.sleep(1)      
        driver.get(edit_page)
        time.sleep(1.5)
        try:      
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")          
            save_changes_button = driver.find_element_by_xpath("//button[@data-testid='editProductFormButtons__save']")
            time.sleep(1)
            save_changes_button.click() 
                   
        except:
            print("post has already been sold")
        
        print(href + " - has been updated at {}".format(datetime.now()))
        postNum = 1 + postNum