Python 当从篮球参考中删除数据时，为什么某些表格会被注释掉？_Python_Web Scraping_Beautifulsoup

Python 当从篮球参考中删除数据时，为什么某些表格会被注释掉？

python web-scraping

Python 当从篮球参考中删除数据时，为什么某些表格会被注释掉？,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正在尝试使用BeautifulSoup从篮球参考资料中获取所有数据。让我们以迈克尔·乔丹为例：。问题是，当我抓取html页面并通过html解析时，我只能抓取一个数据表，而其他数据表似乎被注释掉了。我对python非常陌生，希望有人能告诉我为什么html中似乎有某些数据表作为注释。有人能带我走一走吗 from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup import pandas as p

我正在尝试使用BeautifulSoup从篮球参考资料中获取所有数据。让我们以迈克尔·乔丹为例：。问题是，当我抓取html页面并通过html解析时，我只能抓取一个数据表，而其他数据表似乎被注释掉了。我对python非常陌生，希望有人能告诉我为什么html中似乎有某些数据表作为注释。有人能带我走一走吗

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import pandas as pd

MJ_url = 'https://www.basketball-reference.com/players/j/jordami01.html'

uClient = uReq(MJ_url)

MJ_html = uClient.read()

uClient.close()

MJ_soup = soup(MJ_html, "html.parser")

MJ_containers = MJ_soup.findAll("table",{"class":"row_summable sortable 
stats_table"})

试试这个。评论中的所有数据现在都已通过：

import requests
from bs4 import BeautifulSoup, Comment

res = requests.get("https://www.basketball-reference.com/players/j/jordami01.html",headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(res.text, 'lxml')
for comment in soup.find_all(string=lambda text:isinstance(text,Comment)):
    data = BeautifulSoup(comment,"lxml")
    for items in data.select("table.row_summable tr"):
        tds = [item.get_text(strip=True) for item in items.select("th,td")]
        print(tds)

您使用requests包而不是urllib有什么特别的原因吗？另外，为什么要使用lxml的HTML解析器而不是Python的HTML解析器呢？无论您选择

urllib

还是

请求

，都没有什么大不了的。我使用了

请求

，因为我对它很满意。如果您遵循，您可以看到解析器之间的差异。谢谢。