Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Selenium没有检索所有匹配的类元素_Python_Selenium - Fatal编程技术网

Python Selenium没有检索所有匹配的类元素

Python Selenium没有检索所有匹配的类元素,python,selenium,Python,Selenium,我正在使用Python Selenium Webdriver从以下站点获取一些信息: 我有兴趣拉一些链接,日期和团队名称。我已经编写了以下代码来标识我正在查找的正确信息,但是它似乎只在某一点上获取信息,然后将空项附加到我的列表中(即“.”) 我知道所有的名单应该有66个项目,如果拉正确(肯塔基州打了66场比赛)。你知道为什么它在第二场LSU比赛后停止提取信息吗 bs = [] #boxscores team2 = [] #opponents dates = [] #dates of games

我正在使用Python Selenium Webdriver从以下站点获取一些信息:

我有兴趣拉一些链接,日期和团队名称。我已经编写了以下代码来标识我正在查找的正确信息,但是它似乎只在某一点上获取信息,然后将空项附加到我的列表中(即“.”)

我知道所有的名单应该有66个项目,如果拉正确(肯塔基州打了66场比赛)。你知道为什么它在第二场LSU比赛后停止提取信息吗

bs = [] #boxscores
team2 = [] #opponents
dates = [] #dates of games
team1 = 'KENTUCKY' #team of interest

driver = webdriver.Chrome()
driver.get('http://www.ukathletics.com/schedule-list/#!/m-basebl/2016')

elem = driver.find_elements_by_class_name('event_link')
for i in elem:
    bs.append(i.get_attribute('href'))
links = sorted(set(bs), key=lambda x: bs.index(x))

elem = driver.find_elements_by_class_name('school_name')
team2 = [i.text for i in elem if i.text!=team1]

elem = driver.find_elements_by_class_name('date')
for i in elem:
    dates.append(i.text.replace(',','').replace('\n',' '))

print(links)
print(team2)
print(dates)
print(len(links))
print(len(team2))
print(len(dates))
我的结果:

['http://www.ukathletics.com/game-center/580644ebe4b07dac0ca58a91/', 'http://www.ukathletics.com/game-center/5806455ce4b07dac0ca58a92/', 'http://www.ukathletics.com/game-center/58064594e4b09266491b651d/', 'http://www.ukathletics.com/game-center/5820d9dbe4b0493932cf30fd/', 'http://www.ukathletics.com/game-center/5820da33e4b0493932cf30fe/', 'http://www.ukathletics.com/game-center/5820da86e4b05e67c64470ca/', 'http://www.ukathletics.com/game-center/5820dabde4b0493932cf30ff/', 'http://www.ukathletics.com/game-center/5820daf4e4b05e67c64470cb/', 'http://www.ukathletics.com/game-center/5820db25e4b05e67c64470cc/', 'http://www.ukathletics.com/game-center/5820db6ce4b0493932cf3100/', 'http://www.ukathletics.com/game-center/5820db91e4b05e67c64470de/', 'http://www.ukathletics.com/game-center/5820dbb6e4b05e67c64470df/', 'http://www.ukathletics.com/game-center/5820dbe3e4b0493932cf3101/', 'http://www.ukathletics.com/game-center/5820dc0de4b05e67c64470e0/', 'http://www.ukathletics.com/game-center/58c1e98ee4b066e02ca82086/', 'http://www.ukathletics.com/game-center/5820dc32e4b05e67c64470e1/', 'http://www.ukathletics.com/game-center/5820dc80e4b0493932cf3102/', 'http://www.ukathletics.com/game-center/5820dcaae4b0493932cf3103/', 'http://www.ukathletics.com/game-center/5820dd1ee4b0493932cf3104/', 'http://www.ukathletics.com/game-center/5820dd6fe4b0493932cf3105/', 'http://www.ukathletics.com/game-center/5820dd8ce4b05e67c64470e3/', 'http://www.ukathletics.com/game-center/5820de21e4b05e67c64470e4/', 'http://www.ukathletics.com/game-center/5820de47e4b0493932cf3106/', 'http://www.ukathletics.com/game-center/5820de69e4b05e67c64470e5/', 'http://www.ukathletics.com/game-center/5820de87e4b0493932cf3107/', 'http://www.ukathletics.com/game-center/5820dea9e4b05e67c64470e6/', 'http://www.ukathletics.com/game-center/5820decee4b0493932cf3108/', 'http://www.ukathletics.com/game-center/5820deebe4b05e67c64470e7/', 'http://www.ukathletics.com/game-center/5820df0ce4b05e67c64470e8/', 'http://www.ukathletics.com/game-center/5820df50e4b0493932cf3114/', 'http://www.ukathletics.com/game-center/5820df85e4b05e67c64470e9/', 'http://www.ukathletics.com/game-center/5820dfa9e4b05e67c64470ea/', 'http://www.ukathletics.com/game-center/5820dfc7e4b05e67c64470eb/', 'http://www.ukathletics.com/game-center/5820dfebe4b0493932cf3115/', 'http://www.ukathletics.com/game-center/5820e023e4b0493932cf3116/', 'http://www.ukathletics.com/game-center/5820e03ee4b0493932cf3117/', 'http://www.ukathletics.com/game-center/5820e056e4b0493932cf3118/', 'http://www.ukathletics.com/game-center/5820e089e4b0493932cf3119/', 'http://www.ukathletics.com/game-center/5820e0bee4b05e67c64470ed/', 'http://www.ukathletics.com/game-center/5820e0a4e4b05e67c64470ec/']
['NORTH CAROLINA', 'NORTH CAROLINA', 'NORTH CAROLINA', 'LIBERTY', "ST. JOSEPH'S", 'OLD DOMINION', 'DELAWARE', 'E. KENTUCKY', 'WKU', 'UC SANTA BARBARA', 'UC SANTA BARBARA', 'UC SANTA BARBARA', 'WRIGHT STATE', 'CINCINNATI', 'MIAMI (OH)', 'MIAMI (OH)', 'MIAMI (OH)', 'MURRAY STATE', 'TEXAS A&M', 'TEXAS A&M', 'TEXAS A&M', 'WKU', 'OLE MISS', 'OLE MISS', 'OLE MISS', 'CINCINNATI', 'VANDERBILT', 'VANDERBILT', 'VANDERBILT', 'LOUISVILLE', 'MISSISSIPPI STATE', 'MISSISSIPPI STATE', 'MISSISSIPPI STATE', 'UT MARTIN', 'MIZZOU', 'MIZZOU', 'MIZZOU', 'LOUISVILLE', 'LSU', 'LSU', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
['FRI FEB 17', 'SAT FEB 18', 'SUN FEB 19', 'WED FEB 22', 'FRI FEB 24', 'SAT FEB 25', 'SUN FEB 26', 'TUE FEB 28', 'WED MAR 1', 'FRI MAR 3', 'SAT MAR 4', 'SUN MAR 5', 'TUE MAR 7', 'WED MAR 8', 'THU MAR 9', 'FRI MAR 10', 'SUN MAR 12', 'TUE MAR 14', 'FRI MAR 17', 'SAT MAR 18', 'SUN MAR 19', 'TUE MAR 21', 'THU MAR 23', 'FRI MAR 24', 'SAT MAR 25', 'TUE MAR 28', 'FRI MAR 31', 'SAT APR 1', 'SUN APR 2', 'TUE APR 4', 'FRI APR 7', 'SAT APR 8', 'SUN APR 9', 'WED APR 12', 'FRI APR 14', 'SAT APR 15', 'SUN APR 16', 'TUE APR 18', 'FRI APR 21', 'FRI APR 21', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
40
120
80

实际上,所有元素都没有被提取,因为它们没有被加载。如果您仔细观察表格的底部元素,只有在页面末尾向下滚动时才会加载

您可以尝试在打开页面后添加以下代码,以便加载完整的表

driver = webdriver.Chrome()
driver.get('http://www.ukathletics.com/schedule-list/#!/m-basebl/2016')
time.sleep(5)
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.END)
time.sleep(5)
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL  +Keys.END)
  • 为页面加载添加了等待
  • 向下滚动两次,以确保在长度较长的情况下加载表的实际底部
我已经对其进行了测试,并给出了以下输出:

66    #print(len(links))
198   #print(len(team2))
132   #print(len(dates))

除了不同长度的附加
;所有3个列表都包含正好40个值。有40个链接,40个游戏,40个日期。似乎我理解你所说的趋势,这也是我最初的想法,但是当我检查页面上的元素(例如第三个LSU游戏)时,链接、日期和团队名称都在那里,但代码没有抓住它……你是对的,网站本身的元素看起来是有序的。你能试着在每一步都检查元素的形状吗?我只是重新构造了我的代码,将page_源对象传递到beautifulsou,然后从那里尝试解析。同样的问题仍在发生。这是链接元素形状:这是日期元素形状:

Sat,2月18日

我指的更多的是在
for I in elem
循环中是否发生了一些事情,或者是否一开始就没有找到正确数量的元素