Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/328.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/loops/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何使用python刮取所有日期_Python_Selenium_Selenium Webdriver_Webdriver - Fatal编程技术网

如何使用python刮取所有日期

如何使用python刮取所有日期,python,selenium,selenium-webdriver,webdriver,Python,Selenium,Selenium Webdriver,Webdriver,我需要清理soccerway.com,当我在比赛的每个环节中选择日期(例如2011-2013年)时,我遇到了问题,但问题仅保存在最后日期2012-2013,而不是2011-2012和2012-2013,而是仅保存在最后日期 from time import sleep from urllib.parse import urlparse from bs4 import BeautifulSoup from selenium import webdriver from selenium.commo

我需要清理soccerway.com,当我在比赛的每个环节中选择日期(例如2011-2013年)时,我遇到了问题,但问题仅保存在最后日期2012-2013,而不是2011-2012和2012-2013,而是仅保存在最后日期

from time import sleep
from urllib.parse import urlparse

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException


def get_urls_season(url_path):

    driver = webdriver.Chrome()
    driver.fullscreen_window()
    driver.get("https://us.soccerway.com" + url_path)
    click_privacy_policy(driver)
    date = date_selector(driver)
        #url_list = cycle_through_game_weeks(driver)
    url_list.reverse()
    driver.quit()

    print("=" * 100)
    print(f"{len(set(url_list))} find")

    if input("con? (y/n): ") != "y":
        exit()

    return url_list




def date_selector(driver):
    inptdate='2010-2012'
    startdate=inptdate.split('-')[0]
    enddate=inptdate.split('-')[1]

    while int(startdate)< int(enddate):
        textstring=str(startdate) + "/" + str(int(startdate)+1)
        print(textstring)
        driver.find_element_by_xpath("//select[@name='season_id']/option[text()='" + textstring +"']").click()
        startdate=int(startdate)+1
        url_list = cycle_through_game_weeks(driver)

def click_privacy_policy(driver):
    try:
        driver.find_element_by_class_name("qc-cmp-button").click()
    except NoSuchElementException:
        pass


def cycle_through_game_weeks(driver):
    season_urls = get_fixture_urls(innerhtml_soup(driver))

    while is_previous_button_enabled(driver):
        click_previous_button(driver)
        sleep(2)

        urls = get_fixture_urls(innerhtml_soup(driver))


        urls.reverse()
        season_urls += urls

    return season_urls


def is_previous_button_enabled(driver):
    return driver.find_element_by_id(
        "page_competition_1_block_competition_matches_summary_5_previous"
    ).get_attribute("class") != "previous disabled"


def click_previous_button(driver):
    driver.find_element_by_id(
        "page_competition_1_block_competition_matches_summary_5_previous"
    ).click()


def get_fixture_urls(soup):

    urls = []
    for elem in soup.select(".info-button.button > a"):
        urls.append(urlparse(elem.get("href")).path)
    return urls


def innerhtml_soup(driver):

    html = driver.find_element_by_tag_name("html").get_attribute("innerHTML")
    soup = BeautifulSoup(html, "html.parser")
    return soup
我需要把2011-2013年的所有日期都删掉 2011-2012-2013不仅是最后一天
我找不到问题的原因。

如果我正确理解了代码,问题就出在这里:

url_list = cycle_through_game_weeks(driver)
在每次迭代中,您都要用新的url_列表覆盖旧的url_列表,最简单的解决方案是:

url_list += cycle_through_game_weeks(driver)
更优雅、更有效:

   url_list = []
   while int(startdate)< int(enddate):
        textstring=str(startdate) + "/" + str(int(startdate)+1)
        print(textstring)
        driver.find_element_by_xpath("//select[@name='season_id']/option[text()='" + textstring +"']").click()
        startdate=int(startdate)+1
        url_list.append(cycle_through_game_weeks(driver))
   return url_list

这样,在url_列表[0]下,您将在url_列表[1]下获得第一年的值第二个,依此类推

我使用第二个解决方案运行程序,但我有一个错误:如何解决printf{lenseturl_list}查找类型错误:不可损坏类型:“列出”这些点表示其他部分未老化