从Python应用程序进行谷歌搜索_Python_Api_Google Search Api

从Python应用程序进行谷歌搜索

python api

从Python应用程序进行谷歌搜索,python,api,google-search-api,Python,Api,Google Search Api,我正在尝试从python应用程序运行google搜索查询。有没有python接口可以让我这样做？如果没有人知道哪个谷歌API能让我做到这一点。谢谢。有一个简单的例子（特别是缺少一些引号；-）。您将在web上看到的大部分内容是与旧的、已停止使用的SOAP API的Python接口——我所指的示例使用了更新且受支持的AJAX API，这绝对是您想要的！）编辑：下面是一个更完整的Python 2.6示例，其中包含所有必需的引号&c；-）…：以下是Alex的答案移植到Python3 #!/usr/b

我正在尝试从python应用程序运行google搜索查询。有没有python接口可以让我这样做？如果没有人知道哪个谷歌API能让我做到这一点。谢谢。

有一个简单的例子（特别是缺少一些引号；-）。您将在web上看到的大部分内容是与旧的、已停止使用的SOAP API的Python接口——我所指的示例使用了更新且受支持的AJAX API，这绝对是您想要的！）

编辑：下面是一个更完整的Python 2.6示例，其中包含所有必需的引号&c；-）…：

以下是Alex的答案移植到Python3

#!/usr/bin/python3
import json
import urllib.request, urllib.parse

def showsome(searchfor):
  query = urllib.parse.urlencode({'q': searchfor})
  url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s' % query
  search_response = urllib.request.urlopen(url)
  search_results = search_response.read().decode("utf8")
  results = json.loads(search_results)
  data = results['responseData']
  print('Total results: %s' % data['cursor']['estimatedResultCount'])
  hits = data['results']
  print('Top %d hits:' % len(hits))
  for h in hits: print(' ', h['url'])
  print('For more results, see %s' % data['cursor']['moreResultsUrl'])

showsome('ermanno olmi')

以下是我的方法：

两个代码示例：

    # Get the first 20 hits for: "Breaking Code" WordPress blog
    from google import search
    for url in search('"Breaking Code" WordPress blog', stop=20):
        print(url)

    # Get the first 20 hits for "Mariposa botnet" in Google Spain
    from google import search
    for url in search('Mariposa botnet', tld='es', lang='es', stop=20):
        print(url)

请注意，这段代码没有使用Google API，到目前为止（2012年1月）仍在运行。

我是python新手，我正在研究如何做到这一点。所提供的示例中没有一个适合我。如果你提出很多（很少）请求，有些请求会被谷歌屏蔽，有些则会过时。解析google搜索html（在请求中添加标题）将一直有效，直到google再次更改html结构。您可以使用相同的逻辑在任何其他搜索引擎中搜索，查看html（查看源代码）

（编辑1：添加参数以将google搜索范围缩小到特定站点）

（编辑2：当我添加这个答案时，我正在编写一个Python脚本来搜索字幕。我最近将它上传到Github:）

使用Python 3比Alex的答案有什么优势？@Phill，不确定你所说的“优势”是什么意思。如果您的项目使用Python2，则使用Alex的答案。如果项目使用Python3，您可以使用这个答案。不幸的是，以这种方式编写这个函数，在两个版本的python中使用相同的代码，这并不实际。我想我的问题是，为什么要使用Python3而不是Python2？有什么好处？Python新手，来自PHP背景。事情更正式了吗？@Phill，Python3比Python2更简洁、更一致，但并不完全向后兼容。通常，端口代码所需的更改非常小，正如您在这里看到的，但是许多第三方库和框架仍然不支持Python3，这么多人仍在使用Python2，有没有办法获得超过4次点击？在我本地的Linux机器上尝试过这个，然后谷歌认为我是一个机器人，我的浏览器中的任何搜索都会被验证！我不应该在工作中尝试这个，只是为了提醒使用它的人。添加用户代理和推荐人，使其看起来更像一个真正的请求！不幸的是，2010年11月，它所依赖的技术被弃用。自定义搜索API应该取代此功能，但要求您配置URL列表以在整个web上进行搜索，而不是在整个web上进行搜索。截至今天（2014.06.10），此功能正在运行。。。在我2016年3月的IPython/Python2.7.6版上，这不起作用。Google用以下内容进行响应：{“responseData”：null，“responseDetails”：“Google Web搜索API不再可用。请迁移到Google自定义搜索API（）”，“responseStatus”：403}如上所述，这是一个不推荐使用的API，不再有效。此外，谷歌对所有内容都使用https，因此仅http://url就不推荐使用https。与下面John La Rooy的回答相同。我感兴趣的是为什么这些示例都不适用于您，特别是关于BeautifulSoup的部分不适用，因为HTML是由JavaScript生成的。。。我刚刚试过我的，它正在工作：在我的情况下，我不能使用BeautifulSoup。我对它进行了测试，似乎google正在用javascript块生成html响应，所以我没有找到一种方法来获取与BS类的链接。我只是使用“find”函数在响应中找到了链接。谷歌的URL可能指向使用JavaScript的较新API，而不是使用纯HTML的旧API。我相信在您的查询中添加“&btnG=Google+Search”会导致它使用HTML API，或者至少这是我看到的唯一区别。@MarioVilas谢谢您的提示。我将使用参数进行尝试。也许那样会更快？嗨，马里奥，我试着用你的剧本和它的精彩。我只面临一个问题——即使我使用.COM作为TLD，我也会得到来自.CO.IN的结果。请您提供帮助。请注意，这可能会在任何时候中断，因为它不是使用官方API，而是删除Google结果页面，例如，如果Google更改返回结果的方式。

    # Get the first 20 hits for: "Breaking Code" WordPress blog
    from google import search
    for url in search('"Breaking Code" WordPress blog', stop=20):
        print(url)

    # Get the first 20 hits for "Mariposa botnet" in Google Spain
    from google import search
    for url in search('Mariposa botnet', tld='es', lang='es', stop=20):
        print(url)

import urllib2

def getgoogleurl(search,siteurl=False):
    if siteurl==False:
        return 'http://www.google.com/search?q='+urllib2.quote(search)
    else:
        return 'http://www.google.com/search?q=site:'+urllib2.quote(siteurl)+'%20'+urllib2.quote(search)

def getgooglelinks(search,siteurl=False):
   #google returns 403 without user agent
   headers = {'User-agent':'Mozilla/11.0'}
   req = urllib2.Request(getgoogleurl(search,siteurl),None,headers)
   site = urllib2.urlopen(req)
   data = site.read()
   site.close()

   #no beatifulsoup because google html is generated with javascript
   start = data.find('<div id="res">')
   end = data.find('<div id="foot">')
   if data[start:end]=='':
      #error, no links to find
      return False
   else:
      links =[]
      data = data[start:end]
      start = 0
      end = 0        
      while start>-1 and end>-1:
          #get only results of the provided site
          if siteurl==False:
            start = data.find('<a href="/url?q=')
          else:
            start = data.find('<a href="/url?q='+str(siteurl))
          data = data[start+len('<a href="/url?q='):]
          end = data.find('&amp;sa=U&amp;ei=')
          if start>-1 and end>-1: 
              link =  urllib2.unquote(data[0:end])
              data = data[end:len(data)]
              if link.find('http')==0:
                  links.append(link)
      return links

links = getgooglelinks('python','http://www.stackoverflow.com/')
for link in links:
       print link