Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/redis/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
无法使用beautifulsoup python从google搜索中提取链接_Python_Beautifulsoup_Python Requests_Web Crawler - Fatal编程技术网

无法使用beautifulsoup python从google搜索中提取链接

无法使用beautifulsoup python从google搜索中提取链接,python,beautifulsoup,python-requests,web-crawler,Python,Beautifulsoup,Python Requests,Web Crawler,我想提取谷歌搜索后出现在页面上的链接 import requests from bs4 import BeautifulSoup response = requests.get('https://www.google.com/search?q=machine+learning') soup = BeautifulSoup(response.text, 'html.parser') soup.find_all('div', class_='r') 但它给了我一个空列表[] 有没有办法做到这一

我想提取谷歌搜索后出现在页面上的链接

import requests
from bs4 import BeautifulSoup

response = requests.get('https://www.google.com/search?q=machine+learning')
soup = BeautifulSoup(response.text, 'html.parser')

soup.find_all('div', class_='r')
但它给了我一个空列表
[]

有没有办法做到这一点?
谢谢

如果您正在使用selenium,您应该会得到预期的输出。这是我的工作

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome("path of the chrome driver")
driver.get("https://www.google.com/search?q=machine+learning")
elements=WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,'div.r')))
for ele in elements:
  print(ele.find_element_by_xpath("./a").get_attribute('href'))
输出:

https://www.expertsystem.com/machine-learning-definition/
https://www.geeksforgeeks.org/top-5-best-programming-languages-for-artificial-intelligence-field/
https://www.geeksforgeeks.org/difference-between-machine-learning-and-artificial-intelligence/
http://ai.stanford.edu/~zayd/why-is-machine-learning-hard.html
https://machinelearningmastery.com/start-here/
https://en.wikipedia.org/wiki/Machine_learning
https://www.sas.com/en_gb/insights/analytics/machine-learning.html
https://medium.com/machine-learning-for-humans/why-machine-learning-matters-6164faf1df12
https://www.coursera.org/learn/machine-learning
https://www.expertsystem.com/machine-learning-definition/
https://searchenterpriseai.techtarget.com/definition/machine-learning-ML
https://emerj.com/ai-glossary-terms/what-is-machine-learning/
https://www.geeksforgeeks.org/machine-learning/

如果您使用的是selenium,您应该会得到预期的输出。它与我一起工作

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome("path of the chrome driver")
driver.get("https://www.google.com/search?q=machine+learning")
elements=WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,'div.r')))
for ele in elements:
  print(ele.find_element_by_xpath("./a").get_attribute('href'))
输出:

https://www.expertsystem.com/machine-learning-definition/
https://www.geeksforgeeks.org/top-5-best-programming-languages-for-artificial-intelligence-field/
https://www.geeksforgeeks.org/difference-between-machine-learning-and-artificial-intelligence/
http://ai.stanford.edu/~zayd/why-is-machine-learning-hard.html
https://machinelearningmastery.com/start-here/
https://en.wikipedia.org/wiki/Machine_learning
https://www.sas.com/en_gb/insights/analytics/machine-learning.html
https://medium.com/machine-learning-for-humans/why-machine-learning-matters-6164faf1df12
https://www.coursera.org/learn/machine-learning
https://www.expertsystem.com/machine-learning-definition/
https://searchenterpriseai.techtarget.com/definition/machine-learning-ML
https://emerj.com/ai-glossary-terms/what-is-machine-learning/
https://www.geeksforgeeks.org/machine-learning/
试试这个

import requests
from bs4 import BeautifulSoup
import re

search = input("Search:")
results = 100 # valid options 10, 20, 30, 40, 50, and 100
page = requests.get("https://www.google.com/search?q={}&num={}".format(search, results))
soup = BeautifulSoup(page.content, "html5lib")
links = soup.findAll("a")
for link in links :
    link_href = link.get('href')
    if "url?q=" in link_href and not "webcache" in link_href:
        print(link.get('href').split("?q=")[1].split("&sa=U")[0])
试试这个

import requests
from bs4 import BeautifulSoup
import re

search = input("Search:")
results = 100 # valid options 10, 20, 30, 40, 50, and 100
page = requests.get("https://www.google.com/search?q={}&num={}".format(search, results))
soup = BeautifulSoup(page.content, "html5lib")
links = soup.findAll("a")
for link in links :
    link_href = link.get('href')
    if "url?q=" in link_href and not "webcache" in link_href:
        print(link.get('href').split("?q=")[1].split("&sa=U")[0])

文档中没有类为
'r'
的div。您要提取哪个类?如果您只需要所有div,请删除find_all callI中的class_u参数。我想提取已出现的结果的链接。(出现了9个搜索结果集)看起来链接是异步加载的。因此,在初始响应中,链接不可用。唯一的解决方案是使用selenium驱动程序加载网页,然后提取它们。甚至使用此
驱动程序尝试使用
selenium
。通过xpath(“///*[@id='rso']]/div[3]/div/div[1]/div/div/div[1]/a/div/cite/text()”)查找元素
但是his抛出
InvalidSelectorException
错误文档中没有类为
'r'
的div。您要提取哪个类?如果您只需要所有div,请删除find_all callI中的class_u参数。我想提取已出现的结果的链接。(出现了9个搜索结果集)看起来链接是异步加载的。因此,在初始响应中,链接不可用。唯一的解决方案是使用selenium驱动程序加载网页,然后提取它们。甚至使用此
驱动程序尝试使用
selenium
。通过xpath(“///*[@id='rso']]/div[3]/div/div[1]/div/div/div[1]/a/div/cite/text()”)查找元素
但his抛出
无效选择异常
错误