Python 如何在事先不知道的情况下提取网页的URL？_Python_Selenium_Web Scraping_Beautifulsoup

Python 如何在事先不知道的情况下提取网页的URL？

python selenium web-scraping

Python 如何在事先不知道的情况下提取网页的URL？,python,selenium,web-scraping,beautifulsoup,Python,Selenium,Web Scraping,Beautifulsoup,我试图做一个迭代的网络搜索，只在需要的时候才打开谷歌搜索页面。因此，我不知道提前的网址。我知道Selenium的.current_url参数，但它没有提供我想要的 else: if boolean =='yes': self.append_csv('TP') elif boolean == 'no': driver.get('https://www.google.com/') search = driver.find_eleme

我试图做一个迭代的网络搜索，只在需要的时候才打开谷歌搜索页面。因此，我不知道提前的网址。我知道Selenium的.current_url参数，但它没有提供我想要的

else:
     if boolean =='yes':
        self.append_csv('TP')
     elif boolean == 'no':
        driver.get('https://www.google.com/')
        search = driver.find_element_by_name('q')
        search.clear()
        search.send_keys('{}'.format(query[index]))
        search.send_keys(Keys.RETURN)
        print(driver.current_url)

当我打印（driver.current_url）时，我只得到一个完整的url，但我想提取一个完整的url，如

我需要有这个完整的链接，这样我就可以在BeautifulSoup4中使用它。最终目标是从google搜索中提取所有链接。

实际上，没有必要去google主页进行常规搜索。您可以直接进入搜索页面，如下所示：

def search(driver, text):
    driver.get("https://www.google.com/search?q={}".format(text))

但是，如果您想在搜索中添加几个其他参数，我建议您查看该模块。它将直接为您提供搜索第一个结果的链接，如下所示：

>>> import googlesearch
>>> query = "A computer science portal"
>>> for j in googlesearch.search(query, tld="co.in", num=10, stop=10, pause=2):
    print(j)

    
https://www.geeksforgeeks.org/page/4/
https://www.geeksforgeeks.org/
https://en.wikipedia.org/wiki/Portal:Computer_programming
https://en.wikiversity.org/wiki/Portal:Computer_Science
https://www.csestack.org/
http://www.pearltrees.com/u/17097488-geeksforgeeks-computer-science
https://studentportal.gu.se/english/my-studies/cse
https://www.computerscienceonline.org/
https://portal.cs.nuim.ie/
https://www.quora.com/What-are-the-top-websites-computer-science-students-must-visit

如果您不想直接使用它，可以查看模块的代码。由于它不在github上，您可以在安装它的位置

pip

读取代码。代码并不复杂，关于如何生成google搜索URL的有趣部分不超过100个对齐。

这也是您上面提供的代码正在打印的原因

https://www.google.com/

是因为您没有让网页加载的时间。您可以尝试在打印行之前添加以下行：

import time
time.sleep(2)