Python 这个论坛抓取代码是如何在特定的数据结构中工作的_Python_Web Scraping_Beautifulsoup

Python 这个论坛抓取代码是如何在特定的数据结构中工作的

python web-scraping

Python 这个论坛抓取代码是如何在特定的数据结构中工作的,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我要感谢用户Pythonista在几个月前给了我这段非常有用的代码，它解决了我的问题。然而，由于缺乏HTML和漂亮的汤库知识，我仍然对代码的功能感到困惑我不知道特定的_消息数据结构在这个程序中扮演什么角色我还对代码如何保存各种帖子感到困惑？它如何检查帖子的用户 import requests, pprint from bs4 import BeautifulSoup as BS url = "https://forums.spacebattles.com/threads/the-wiza

我要感谢用户Pythonista在几个月前给了我这段非常有用的代码，它解决了我的问题。然而，由于缺乏HTML和漂亮的汤库知识，我仍然对代码的功能感到困惑

我不知道特定的_消息数据结构在这个程序中扮演什么角色

我还对代码如何保存各种帖子感到困惑？它如何检查帖子的用户

import requests, pprint
from bs4 import BeautifulSoup as BS

url = "https://forums.spacebattles.com/threads/the-wizard-of-woah-and-the-impossible-methods-of-necromancy.337233/"
r = requests.get(url)
soup = BS(r.content, "html.parser")

#To find all posts from a specific user everything below this is for all posts
specific_messages = soup.findAll('li', {'data-author': 'The Wizard of Woah!'})


#To find every post from every user
posts = {}

message_container = soup.find('ol', {'id':'messageList'})
messages = message_container.findAll('li', recursive=0)
for message in messages:
    author = message['data-author']
    #or don't encode to utf-8 simply for printing in shell
    content = message.find('div', {'class':'messageContent'}).text.strip().encode("utf-8")
    if author in posts:
        posts[author].append(content)
    else:
        posts[author] = [content]
pprint.pprint(posts)

specific_messages=soup.findAll（'li'，{'data-author'：'thewizardofwoah！'}）

soup是解析html所需的BeautifulSoup对象

findAll（）是一个查找html代码中传递的所有参数的函数

李是需要找到的标签

数据作者是html属性，将在

标记中搜索该属性

哇的巫师！是作者的名字

因此，基本上，该行正在搜索所有带有属性数据作者的

标记，该作者的名字是Woah向导

findall返回多行，所以您需要循环遍历它，这样您就可以得到每一行，并且它将附加到一个列表中

就这些

谢谢，我只是有点搞不清楚变量在脚本中起什么作用，因为它没有引用到脚本中的任何其他地方