Javascript 如何获取Krugle和open hub的搜索页面内容_Javascript_Python_Html_Code Search Engine

Javascript 如何获取Krugle和open hub的搜索页面内容

javascript python html

Javascript 如何获取Krugle和open hub的搜索页面内容,javascript,python,html,code-search-engine,Javascript,Python,Html,Code Search Engine,我想做一个工具来分析代码搜索引擎的结果，如克鲁格或OpenHub。我尝试了java和python来获取搜索结果的html页面： import urllib2 def write_url(url, file_name, if_show): if (url is None) or (file_name is None): return req = urllib2.Request(url) resp = urllib2.urlopen(req)

我想做一个工具来分析代码搜索引擎的结果，如克鲁格或OpenHub。我尝试了java和python来获取搜索结果的html页面：

import urllib2
def write_url(url, file_name, if_show):    
    if (url is None) or (file_name is None):
        return

    req = urllib2.Request(url)
    resp = urllib2.urlopen(req)

    ret = resp.read()

    fp = open(file_name, "w")
    fp.write(ret)
    fp.close()
    if if_show:
        print ret


if __name__ == "__main__":
    url_ = "http://www.krugle.org/document/search/#query=socket"
    file_n = "D:/tmp/test.txt"
    write_url(url_, file_n, True)
    print "Done"

但是我没有得到结果的内容。我得到的部分页面如下所示：

            <div class="content_result_body">
                <div id="hit_list"></div>
                <div class="paging" style="display: none;"></div>
            </div>

            <div class="content_result_body">
                <div id="hit_list">
                    <div class="hit">...</div>
                    <div class="hit">...</div>
                    <div class="hit">...</div>
                </div>
                <div class="paging">...</div>
            </div>

我用chrome查看搜索结果页面。是这样的：

            <div class="content_result_body">
                <div id="hit_list"></div>
                <div class="paging" style="display: none;"></div>
            </div>

            <div class="content_result_body">
                <div id="hit_list">
                    <div class="hit">...</div>
                    <div class="hit">...</div>
                    <div class="hit">...</div>
                </div>
                <div class="paging">...</div>
            </div>

而在这部电影中，那。。。表示克鲁格搜索结果的内容。我不确定为什么python代码返回的页面的div.hit_列表中没有任何内容。可能结果的内容是由js生成的。但我不知道如何通过代码获得它。

要处理动态加载内容的页面，您可以尝试使用Selenium

from selenium import webdriver

url = "your-url.com"
br = webdriver.Firefox()
br.get(url)

html = br.page_source

当然，这也会打开一个web浏览器。如果不方便，我可以告诉您如何使用xvfb或phantomjs