Python 动态更改网站代码的网络爬虫_Python_Html_Python 3.x_Beautifulsoup_Web Crawler

Python 动态更改网站代码的网络爬虫

python html python-3.x web-crawler

Python 动态更改网站代码的网络爬虫,python,html,python-3.x,beautifulsoup,web-crawler,Python,Html,Python 3.x,Beautifulsoup,Web Crawler,我目前正试图让自己更多地参与编程和python。对于一个小项目，我想为一个网站构建一个网络爬虫。因此，我读到了关于“刮痧”和“美容”的文章到目前为止还不错网站结构这是一个简单的网站，有下拉菜单可供选择。如果我选择其中一个，网站URL不会改变。而且只有底层html代码发生了更改。选择一个值时，您会得到一个结果表，其中有几列/行，格式如下： <div id="result"> <table class="table"> <thead>

我目前正试图让自己更多地参与编程和python。对于一个小项目，我想为一个网站构建一个网络爬虫。因此，我读到了关于“刮痧”和“美容”的文章到目前为止还不错

网站结构这是一个简单的网站，有下拉菜单可供选择。如果我选择其中一个，网站URL不会改变。而且只有底层html代码发生了更改。选择一个值时，您会得到一个结果表，其中有几列/行，格式如下：

<div id="result">
    <table class="table">
        <thead>
            <tr>
                <th>A</th>
                <th>B</th>
                <th>C</th>
                <th>D</th>
                <th>E</th>
            </tr>
        </thead>
    <tbody>
        <tr>
            <td><b>...</b></td>
            <td>...</td>
            <td>...</td>
            <td>...</td>
            <td>...</td>
        </tr>
        ...
        more follows here...

每次单击按钮时，您都会将数据发布到服务器，您可以在chrome开发工具（F12）中找到发布数据：

您可以使用

请求来模拟此帖子：
In [27]: data = {'findpass':'1',
    ...: 'router':'Belkin',
    ...: 'findpassword':'Find Password'}

In [28]: r = requests.post('http://routerpasswords.com/', data=data)

每次单击按钮时，您都会将数据发布到服务器，您可以在chrome开发工具（F12）中找到发布数据：

您可以使用请求来模拟此帖子：
In [27]: data = {'findpass':'1',
    ...: 'router':'Belkin',
    ...: 'findpassword':'Find Password'}

In [28]: r = requests.post('http://routerpasswords.com/', data=data)

首先我把所有路由器的名字都列在一个列表里
然后为每个路由器执行一个具有正确post参数的新请求（def:get_passwords_via_name）
方法：
首先我把所有路由器的名字都列在一个列表里
然后为每个路由器执行一个具有正确post参数的新请求（def:get_passwords_via_name）
方法：
你需要像这样的东西。看，你需要这样的东西。谢谢，这句话对我帮助很大。我不知道！谢谢：）@ricksanchez欢迎来到SO，如果你觉得我的答案有帮助，你可以接受。谢谢你的评论对我帮助很大。我不知道！谢谢：）@ricksanchez欢迎来到SO，如果你觉得我的答案有帮助，你可以接受。
from bs4 import BeautifulSoup
import requests

BASE_URL = "http://routerpasswords.com/"


def get_router_types(url):
    r = requests.get(url)
    html_content = r.content
    soup = BeautifulSoup(html_content)
    print("option values: \n")
    option_values = soup.find_all("option")
    print(option_values)
    print(" \n")
    print("router types: \n")
    router_types = [option.get('value') for option in     soup.find_all('option')]
    return router_types, r

def get_passwords_via_name(router_name, rcookie):
    data = {"findpass": "1", "router": router_name, "findpassword":  "Find+Password"}
    print data

    c = requests.post('http://routerpasswords.com/', data=data)
    print c.url

    html_content = c.content
    print c.status_code
    soup = BeautifulSoup(html_content)
    return soup.find("div", {"id": "result"})

def main():
    rlist, r = get_router_types(BASE_URL)
    for i in rlist:
        print "debug"
        print get_passwords_via_name(i, r)

if __name__ == "__main__":
    main()

 curl 'http://routerpasswords.com/'   --data 'findpass=1&router=ZyXEL&findpassword=Find+Password'