在python中使用selenium进行导航
我正在使用Python和Selenium抓取这个网站。但它目前只抓取7月份的前10页,它将“下一步”按钮的前一个同级的页码转换为int,并单击“下一个页数”——“1”,但当它到达第10页后,它就停止了 URL- 有人能帮我把所有的书页都刮下来吗在python中使用selenium进行导航,python,selenium,selenium-webdriver,Python,Selenium,Selenium Webdriver,我正在使用Python和Selenium抓取这个网站。但它目前只抓取7月份的前10页,它将“下一步”按钮的前一个同级的页码转换为int,并单击“下一个页数”——“1”,但当它到达第10页后,它就停止了 URL- 有人能帮我把所有的书页都刮下来吗 def pagination( driver ): data = [] last_element = driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-sp
def pagination( driver ):
data = []
last_element = driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]/preceding-sibling::a[1]')
if last_element is None:
number_of_pages = 1
else:
number_of_pages = int( last_element.text )
# data = [ getData( driver ) ]
data.extend(getData(driver))
for i in range(number_of_pages - 1):
driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]').click()
data.extend( getData( driver ) )
time.sleep(1)
return data
页面的数量似乎为10 找到另一种方法来找出有多少页 您可以使用while循环检查“下一页”按钮是否可用,如果可用,请继续,否则-这是最后一页 像这样:
while next_button_element.is_displayed():
// Do the action that is currently in the for loop
页面的数量似乎为10 找到另一种方法来找出有多少页 您可以使用while循环检查“下一页”按钮是否可用,如果可用,请继续,否则-这是最后一页 像这样:
while next_button_element.is_displayed():
// Do the action that is currently in the for loop
您可以使用的代码:
while True:
data.extend(getData(driver))
try:
driver.find_element_by_css_selector('a.next').click()
except:
break
您可以使用的代码:
while True:
data.extend(getData(driver))
try:
driver.find_element_by_css_selector('a.next').click()
except:
break
听着,我知道你在你之前的一个问题上是从我的电脑里计算总页数的。在上一个案例中,由于最后一个页码是直接提供给我们的,所以它起了作用,但这里的情况并非如此 解决方案:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text #Showing 1-10 of 174
number_of_entries_text = showing_text.split("of",1)[1] # 174 as text
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0]) #174 as int
number_of_pages = (number_of_entries/10) + 1 #18
def pagination( driver ):
data = []
last_element = driver.find_element_by_xpath("//span[@class='showing']")
if last_element is None:
number_of_pages = 1
else:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text number_of_entries_text = showing_text.split("of",1)[1]
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0])
number_of_pages = (number_of_entries/10) +1
for i in range(number_of_pages - 1):
driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]').click()
time.sleep(1)
虽然页数不是直接可用的,但条目总数是-
现在,正如你在上面7月份的截图中看到的,这个数字是174。假设您将分页长度(单个页面中的条目数)设置为默认值10,则页面数应为18(17页,每页10条目数,其余4条目数增加一页)
因此,计算页数的逻辑应该很简单。如果您在total\u entries
变量中以某种方式获得了该条目的总数,则页数应为(取自:
Python默认情况下通过除法运算符返回下限整数,因此174/10
将返回17
,添加+1
将返回18
。因此,页面数为-18
现在,要提取条目的总数,请使用下面的定位器查找包含该条目的
元素
driver.find_element_by_xpath('//span[@class='showing']')
但此元素包含如下文本-显示174个
中的1-10个。您只需要整个字符串中的174
部分。为此,首先提取“of”之后的字符串,然后将其转换为int
从文本中提取总条目数为int的算法:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text #Showing 1-10 of 174
number_of_entries_text = showing_text.split("of",1)[1] # 174 as text
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0]) #174 as int
number_of_pages = (number_of_entries/10) + 1 #18
def pagination( driver ):
data = []
last_element = driver.find_element_by_xpath("//span[@class='showing']")
if last_element is None:
number_of_pages = 1
else:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text number_of_entries_text = showing_text.split("of",1)[1]
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0])
number_of_pages = (number_of_entries/10) +1
for i in range(number_of_pages - 1):
driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]').click()
time.sleep(1)
最终代码:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text #Showing 1-10 of 174
number_of_entries_text = showing_text.split("of",1)[1] # 174 as text
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0]) #174 as int
number_of_pages = (number_of_entries/10) + 1 #18
def pagination( driver ):
data = []
last_element = driver.find_element_by_xpath("//span[@class='showing']")
if last_element is None:
number_of_pages = 1
else:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text number_of_entries_text = showing_text.split("of",1)[1]
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0])
number_of_pages = (number_of_entries/10) +1
for i in range(number_of_pages - 1):
driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]').click()
time.sleep(1)
注意:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text #Showing 1-10 of 174
number_of_entries_text = showing_text.split("of",1)[1] # 174 as text
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0]) #174 as int
number_of_pages = (number_of_entries/10) + 1 #18
def pagination( driver ):
data = []
last_element = driver.find_element_by_xpath("//span[@class='showing']")
if last_element is None:
number_of_pages = 1
else:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text number_of_entries_text = showing_text.split("of",1)[1]
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0])
number_of_pages = (number_of_entries/10) +1
for i in range(number_of_pages - 1):
driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]').click()
time.sleep(1)
我认为我的解决方案更好,因为你不必反复检查任何元素是否可用,也不必捕获任何异常。你只需直接获取页数,然后多次单击“下一步”按钮。听着,我知道你在你前面的一个问题中是想从我的计算总页数的。在上一个案例中,由于最后一个页码是直接提供给我们的,所以它起了作用,但这里的情况并非如此 解决方案:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text #Showing 1-10 of 174
number_of_entries_text = showing_text.split("of",1)[1] # 174 as text
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0]) #174 as int
number_of_pages = (number_of_entries/10) + 1 #18
def pagination( driver ):
data = []
last_element = driver.find_element_by_xpath("//span[@class='showing']")
if last_element is None:
number_of_pages = 1
else:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text number_of_entries_text = showing_text.split("of",1)[1]
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0])
number_of_pages = (number_of_entries/10) +1
for i in range(number_of_pages - 1):
driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]').click()
time.sleep(1)
虽然页数不是直接可用的,但条目总数是-
现在,正如您在上面7月份的屏幕截图中所看到的,这个数字是174。假设您将分页长度(单个页面中的条目数)设置为默认值10,那么页面数应该是18(17页,每个页面包含10条条目,其余4条条目多出一页)
因此,计算页数的逻辑应该很简单。如果您在total\u entries
变量中以某种方式获得了该条目的总数,则页数应该是(取自:
Python默认情况下通过除法运算符返回下限整数,因此174/10
将返回17
,添加+1
将返回18
。因此,页面数为-18
现在,要提取条目的总数,请使用下面的定位器查找包含该条目的
元素
driver.find_element_by_xpath('//span[@class='showing']')
但此元素包含如下文本-显示174个
中的1-10个。您只需要整个字符串中的174
部分。为此,首先提取“of”之后的字符串,然后将其转换为int
从文本中提取总条目数为int的算法:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text #Showing 1-10 of 174
number_of_entries_text = showing_text.split("of",1)[1] # 174 as text
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0]) #174 as int
number_of_pages = (number_of_entries/10) + 1 #18
def pagination( driver ):
data = []
last_element = driver.find_element_by_xpath("//span[@class='showing']")
if last_element is None:
number_of_pages = 1
else:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text number_of_entries_text = showing_text.split("of",1)[1]
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0])
number_of_pages = (number_of_entries/10) +1
for i in range(number_of_pages - 1):
driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]').click()
time.sleep(1)
最终代码:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text #Showing 1-10 of 174
number_of_entries_text = showing_text.split("of",1)[1] # 174 as text
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0]) #174 as int
number_of_pages = (number_of_entries/10) + 1 #18
def pagination( driver ):
data = []
last_element = driver.find_element_by_xpath("//span[@class='showing']")
if last_element is None:
number_of_pages = 1
else:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text number_of_entries_text = showing_text.split("of",1)[1]
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0])
number_of_pages = (number_of_entries/10) +1
for i in range(number_of_pages - 1):
driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]').click()
time.sleep(1)
注意:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text #Showing 1-10 of 174
number_of_entries_text = showing_text.split("of",1)[1] # 174 as text
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0]) #174 as int
number_of_pages = (number_of_entries/10) + 1 #18
def pagination( driver ):
data = []
last_element = driver.find_element_by_xpath("//span[@class='showing']")
if last_element is None:
number_of_pages = 1
else:
showing_text = driver.find_element_by_xpath("//span[@class='showing']").text number_of_entries_text = showing_text.split("of",1)[1]
number_of_entries = int( re.findall(r'\d+',number_of_entries_text)[0])
number_of_pages = (number_of_entries/10) +1
for i in range(number_of_pages - 1):
driver.find_element_by_xpath('//a[ contains( concat( " ", normalize-space( @class ), " "), " next ") ]').click()
time.sleep(1)
我认为我的解决方案更好,因为您不必反复检查任何元素是否可用或捕获任何异常。您只需直接获取页数,然后多次单击“下一步”按钮。能否在for循环之前打印页数?我怀疑,因为您将最后一个元素的文本转换为int,它只显示了10页(即使有更多的页面),我刚刚测试了你的右侧,它只将10变成int,其他页面不会按照你给定的链接进行[URL-。我只看到了10页。你是否检查了7月份?如果你按第10页,应该会出现更多页面。你能在for循环之前打印页面数吗?我怀疑,因为你将最后一个元素的文本转换为int,它只显示了10页(即使有更多页面)我刚刚测试了一下你的右边,它只会把10变成int,其他页面不会按照你给定的链接进行[URL-。我只看到了10页。如果按第10页,您是否检查了7月份,应该会出现更多页面?您的意思是:下一个按钮元素=驱动程序。通过xpath查找元素('//a[contains(concat(“,normalize space(@class),”),“next”)])当下一个按钮显示时():driver.find通过xpath('//a[contains(concat(“,normalize space(class),”“),“next”)))。单击()数据。扩展(getData(driver))时间。睡眠(1)