获取150个Google搜索结果的Python脚本_Python

获取150个Google搜索结果的Python脚本

python

获取150个Google搜索结果的Python脚本,python,Python,我需要得到第一个15页的谷歌搜索结果与python的帮助。我试着回答这个问题。但我没有得到事先的结果。我需要150搜索结果的，与python的原始链接。如果有人知道，给我解决这个问题的办法。提前感谢。我通过这种方式获得了150个搜索结果： import sys # Used to add the BeautifulSoup folder the import path import urllib2 # Used to read the html document if __name__ ==

我需要得到第一个15页的谷歌搜索结果与python的帮助。我试着回答这个问题。但我没有得到事先的结果。我需要150搜索结果的，与python的原始链接。如果有人知道，给我解决这个问题的办法。提前感谢。

我通过这种方式获得了150个搜索结果：

import sys # Used to add the BeautifulSoup folder the import path
import urllib2 # Used to read the html document

if __name__ == "__main__":
    ### Import Beautiful Soup
    ### Here, I have the BeautifulSoup folder in the level of this Python script
    ### So I need to tell Python where to look.
    sys.path.append("./BeautifulSoup")
    from BeautifulSoup import BeautifulSoup

    ### Create opener with Google-friendly user agent
    opener = urllib2.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/5.0')]

    ### Open page & generate soup
    ### the "start" variable will be used to iterate through 10 pages.
    for start in range(0,15):
        url = "http://www.google.com/search?q=site:stackoverflow.com&start=" + str(start*10)
        page = opener.open(url)
        soup = BeautifulSoup(page)

        ### Parse and find
        ### Looks like google contains URLs in <cite> tags.
        ### So for each cite tag on each page (10), print its contents (url)
        for cite in soup.findAll('cite'):
            print cite.text

import sys#用于将BeautifulSoup文件夹添加到导入路径
导入urllib2#用于读取html文档
如果名称=“\uuuuu main\uuuuuuuu”：
###进口靓汤
###在这里，我有这个Python脚本级别的BeautifulSoup文件夹
###所以我需要告诉Python去哪里找。
sys.path.append（“./beautifulsou”）
从BeautifulSoup导入BeautifulSoup
###使用Google友好的用户代理创建opener
opener=urllib2.build\u opener（）
opener.addheaders=[（'User-agent'，'Mozilla/5.0'）]
###打开页面并生成汤
###“start”变量将用于迭代10页。
对于范围（0,15）内的启动：
url=”http://www.google.com/search?q=site:stackoverflow.com&start=“+str（开始*10）
page=opener.open（url）
汤=美汤（第页）
###解析并查找
###看起来google在标签中包含URL。
###因此，对于每个页面（10）上的每个引用标记，打印其内容（url）
对于汤中的cite.findAll（“cite”）：
打印cite.text

您只需在以下操作之前安装

beautifulsou

：

pip安装beautifulsou

代码来自您引用的链接：

或者，您可以使用Repo

说明相当简单：

pip install google-search-results

用法是：

from lib.google_search_results import GoogleSearchResults
query = GoogleSearchResults({"q": "coffee"})
html_results = query.get_html()

更高级的应用在SERP API Github上。

这类问题真是太棒了。Ganeshgm7，你能给你的问题加上一些你尝试的例子吗？那太好了，是的。它起作用了。非常感谢。但当手动检查它时，它略有不同。但我不知道为什么。？