使用bs4、python进行谷歌搜索_Python_Web Scraping_Beautifulsoup

使用bs4、python进行谷歌搜索

python web-scraping

使用bs4、python进行谷歌搜索,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我想通过python脚本中的Google搜索获取“”的地址。为什么我的代码不起作用 from bs4 import BeautifulSoup # from googlesearch import search import urllib.request import datetime article='spotlight 29 casino address' url1 ='https://www.google.co.in/#q='+article content1 = urllib.reque

我想通过python脚本中的Google搜索获取“”的地址。为什么我的代码不起作用

from bs4 import BeautifulSoup
# from googlesearch import search
import urllib.request
import datetime
article='spotlight 29 casino address'
url1 ='https://www.google.co.in/#q='+article
content1 = urllib.request.urlopen(url1)
soup1 = BeautifulSoup(content1,'lxml')
#print(soup1.prettify())
div1 = soup1.find('div', {'class':'Z0LcW'}) #get the div where it's located
# print (datetime.datetime.now(), 'street address:  ' , div1.text)
print (div1)

如果你想获得谷歌搜索结果。这是一种更简单的方法

下面是简单的代码

from selenium import webdriver
import urllib.parse
from bs4 import BeautifulSoup

chromedriver = '/xxx/chromedriver' #xxx is chromedriver in your installed path
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chromedriver, chrome_options=chrome_options)

article='spotlight 29 casino address'
driver.get("https://www.google.co.in/#q="+urllib.parse.quote(article))
# driver.page_source  <-- html source, you can parser it later.
soup = BeautifulSoup(driver.page_source, 'lxml')
div = soup.find('div',{'class':'Z0LcW'})
print(div.text)
driver.quit()

从selenium导入webdriver
导入urllib.parse
从bs4导入BeautifulSoup
chromedriver='/xxx/chromedriver'#xxx是安装路径中的chromedriver
chrome\u options=webdriver.ChromeOptions（）
chrome\u选项。添加\u参数（“--headless”）
driver=webdriver.Chrome（chromedriver，Chrome\u选项=Chrome\u选项）
第29条赌场地址
驱动程序。获取（“https://www.google.co.in/#q=“+urllib.parse.quote（文章））
#driver.page_sourceGoogle使用javascript呈现，这就是为什么不使用urllib.request.urlopen接收div
作为解决方案，您可以使用selenium-python库来模拟浏览器。使用“pip Install selenium”控制台命令安装，这样的代码就可以工作了：
from bs4 import BeautifulSoup
from selenium import webdriver


article = 'spotlight 29 casino address'
url = 'https://www.google.co.in/#q=' + article
driver = webdriver.Firefox()
driver.get(url)
html = BeautifulSoup(driver.page_source, "lxml")

div = html.find('div', {'class': 'Z0LcW'})
print(div.text)

您得到的是一个空的div
，因为默认情况下，如果您使用requests
library（）（或类似的东西）user agent
，并且您的请求被Google阻止，则会有一个python requests
。使用用户代理
可以伪造用户浏览器访问
通过添加用户代理
，如果地址是HTML代码（在本例中为HTML代码），则无需使用selenium即可实现
headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

以下是代码和：
输出：
46-200 Harrison Pl, Coachella, CA 92236

这一个在我的案例中显示错误。我没有得到这个。我能做什么？driver=webdriver.Chrome（chromedriver）#安装路径中的chromedriver在计算机中下载，chromedriver='计算机中的文件路径。'Oh！！对不起，我弄错了，我以前没注意到。在给出文件路径后，它工作了。谢谢@chifu lin不打开浏览器就可以了吗？只需在终端@chifu linuse无头模式下打印输出。我修改了我的代码，您可以参考。我已经安装了selenium，但仍然在第76行的driver=webdriver.Firefox（）
@djangmasterTraceback（最近一次调用）中显示错误：文件“C:\Users\RUMAN\AppData\Local\Programs\Python\Python36-32\lib\site packages\selenium\webdriver\common\service.py”，在启动stdin=PIPE）文件“C:\Users\RUMAN\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py”第709行，在启动还原信号，启动新会话）文件“C:\Users\RUMAN\AppData\Local\Programs\Python\Python36-32\lib\subprocess.py”第997行，在执行子启动信息）文件NotFoundError:[WinError 2]在处理上述异常期间，系统找不到指定的文件，发生了另一个异常：selenium.common.exceptions.WebDriverException:消息：“geckodriver”可执行文件需要位于路径中geckdriver需要为firefox安装吗？啊，好的，看来您的系统中没有安装firefox。您可以使用任何其他浏览器，例如使用webdriver.Ie（）而不是Firefox。您可以查看此页面了解更多信息：重要的是，您将使用的web浏览器必须在操作系统中可执行。
46-200 Harrison Pl, Coachella, CA 92236