Python 美丽的团队发现所有人都可以';无法获取div数据

Python 美丽的团队发现所有人都可以';无法获取div数据,python,python-3.x,beautifulsoup,Python,Python 3.x,Beautifulsoup,我试图从网站获取html数据,但数据表返回空值 并尝试跟踪代码,当我尝试获取标题数据时,它将返回html上下文 import requests from bs4 import BeautifulSoup import html.parser from html.parser import HTMLParser import time from random import randint import sys from IPython

我试图从网站获取html数据,但数据表返回空值 并尝试跟踪代码,当我尝试获取标题数据时,它将返回html上下文

    import requests
    from bs4 import BeautifulSoup
    import html.parser
    from html.parser import HTMLParser
    import time
    from random import randint
    import sys
    from IPython.display import clear_output
    import pymysql

links = ['https://www.ptt.cc/bbs/Gossiping/index'+str(i+1)+'.html' for i in range(10)]
    data_links=[]

for link in links:
    res = requests.get(link)
    soup = BeautifulSoup(res.text.encode("utf-8"),"html.parser")
    data_table = soup.findAll("div",{"id":"r-ent"})
    print(data_table)

当您在浏览器中访问该页面时,您必须确认您已年满18岁,然后才能访问实际内容,因此这就是您正在访问的页面,您需要在
https://www.ptt.cc/ask/over18
带有数据
yes=yes
from=“/bbs/Gossiping/index{the_number}.html”
,如果打印返回的源,则可以看到该表单

<form action="/ask/over18" method="post">
    <input type="hidden" name="from" value="/bbs/Gossiping/index1.html">
    <div class="over18-button-container">
        <button class="btn-big" type="submit" name="yes" value="yes">我同意,我已年滿十八歲<br><small>進入</small></button>
    </div>
    <div class="over18-button-container">
        <button class="btn-big" type="submit" name="no" value="no">未滿十八歲或不同意本條款<br><small>離開</small></button>
    </div>
</form>
上面的代码获取类
r-ent
的所有div

使用会话只发布一次可能没有问题,因为cookies将被存储,所以下面的代码应该可以正常工作

links = ['https://www.ptt.cc/bbs/Gossiping/index{}.html' for i in range(1,11)]
data_links=[]
data = {"yes":"yes"}
head = {"User-Agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"}
with requests.Session() as s:
    data["from"] = "/bbs/Gossiping/index1.html"
    s.post("https://www.ptt.cc/ask/over18", data=data, headers=head)
    for link in links:
        res = s.get(link, headers=head)
        BeautifulSoup(res.text,"html.parser")
        data_divs= soup.select("div.r-ent")
        print(data_divs)

你可以将html结构粘贴到你想要获取数据的地方。还要检查从get请求响应中获得的“res”值。
links = ['https://www.ptt.cc/bbs/Gossiping/index{}.html' for i in range(1,11)]
data_links=[]
data = {"yes":"yes"}
head = {"User-Agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0"}
with requests.Session() as s:
    data["from"] = "/bbs/Gossiping/index1.html"
    s.post("https://www.ptt.cc/ask/over18", data=data, headers=head)
    for link in links:
        res = s.get(link, headers=head)
        BeautifulSoup(res.text,"html.parser")
        data_divs= soup.select("div.r-ent")
        print(data_divs)