Python 美丽的团队发现所有人都可以'；无法获取div数据_Python_Python 3.x_Beautifulsoup

Python 美丽的团队发现所有人都可以'；无法获取div数据

python python-3.x

Python 美丽的团队发现所有人都可以'；无法获取div数据,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我试图从网站获取html数据，但数据表返回空值并尝试跟踪代码，当我尝试获取标题数据时，它将返回html上下文 import requests from bs4 import BeautifulSoup import html.parser from html.parser import HTMLParser import time from random import randint import sys from IPython

我试图从网站获取html数据，但数据表返回空值并尝试跟踪代码，当我尝试获取标题数据时，它将返回html上下文

    import requests
    from bs4 import BeautifulSoup
    import html.parser
    from html.parser import HTMLParser
    import time
    from random import randint
    import sys
    from IPython.display import clear_output
    import pymysql

links = ['https://www.ptt.cc/bbs/Gossiping/index'+str(i+1)+'.html' for i in range(10)]
    data_links=[]

for link in links:
    res = requests.get(link)
    soup = BeautifulSoup(res.text.encode("utf-8"),"html.parser")
    data_table = soup.findAll("div",{"id":"r-ent"})
    print(data_table)

当您在浏览器中访问该页面时，您必须确认您已年满18岁，然后才能访问实际内容，因此这就是您正在访问的页面，您需要在

https://www.ptt.cc/ask/over18

带有数据

yes=yes

和

from=“/bbs/Gossiping/index{the_number}.html”

，如果打印返回的源，则可以看到该表单

<form action="/ask/over18" method="post">
    <input type="hidden" name="from" value="/bbs/Gossiping/index1.html">
    <div class="over18-button-container">
        <button class="btn-big" type="submit" name="yes" value="yes">我同意，我已年滿十八歲<br><small>進入</small></button>
    </div>
    <div class="over18-button-container">
        <button class="btn-big" type="submit" name="no" value="no">未滿十八歲或不同意本條款<br><small>離開</small></button>
    </div>
</form>

上面的代码获取类

r-ent

的所有div

使用会话只发布一次可能没有问题，因为cookies将被存储，所以下面的代码应该可以正常工作

links = ['https://www.ptt.cc/bbs/Gossiping/index{}.html' for i in range(1,11)]
data_links=[]
data = {"yes":"yes"}
head = {"User-Agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"}
with requests.Session() as s:
    data["from"] = "/bbs/Gossiping/index1.html"
    s.post("https://www.ptt.cc/ask/over18", data=data, headers=head)
    for link in links:
        res = s.get(link, headers=head)
        BeautifulSoup(res.text,"html.parser")
        data_divs= soup.select("div.r-ent")
        print(data_divs)

你可以将html结构粘贴到你想要获取数据的地方。还要检查从get请求响应中获得的“res”值。

links = ['https://www.ptt.cc/bbs/Gossiping/index{}.html' for i in range(1,11)]
data_links=[]
data = {"yes":"yes"}
head = {"User-Agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"}
with requests.Session() as s:
    data["from"] = "/bbs/Gossiping/index1.html"
    s.post("https://www.ptt.cc/ask/over18", data=data, headers=head)
    for link in links:
        res = s.get(link, headers=head)
        BeautifulSoup(res.text,"html.parser")
        data_divs= soup.select("div.r-ent")
        print(data_divs)