Python 3.x 使用BeautifulSoup访问已注释的HTML行_Python 3.x_Beautifulsoup

Python 3.x 使用BeautifulSoup访问已注释的HTML行

python-3.x

Python 3.x 使用BeautifulSoup访问已注释的HTML行,python-3.x,beautifulsoup,Python 3.x,Beautifulsoup,我正在尝试从以下特定网页获取统计信息：然而，当我查看HTML源代码时，“防御游戏日志”的表格似乎被注释掉了（以开始）因此，当尝试使用BeautifulSoup4时，以下代码仅获取未注释掉的攻击性数据，而将注释掉防御性数据 from urllib.request import Request,urlopen from bs4 import BeautifulSoup import re accessurl = 'https://www.sports-reference.com/cfb/sch

我正在尝试从以下特定网页获取统计信息：

然而，当我查看HTML源代码时，“防御游戏日志”的表格似乎被注释掉了（以开始）

因此，当尝试使用BeautifulSoup4时，以下代码仅获取未注释掉的攻击性数据，而将注释掉防御性数据

from urllib.request import Request,urlopen
from bs4 import BeautifulSoup
import re

accessurl = 'https://www.sports-reference.com/cfb/schools/oklahoma-state/2016/gamelog/'
req = Request(accessurl)
link = urlopen(req)
soup = BeautifulSoup(link.read(), "lxml")


tables = soup.find_all(['th', 'tr'])
my_table = tables[0]
rows = my_table.findChildren(['tr'])
for row in rows:
    cells = row.findChildren('td')
    for cell in cells:
        value = cell.string
        print(value)

我很好奇是否有任何解决方案能够将所有防御值添加到列表中，就像存储进攻数据一样，无论是在BeautifulSoup4内部还是外部。谢谢

请注意，我添加到下面给出的解决方案中，该解决方案源自：

Comment

对象将为您提供您想要的：

from urllib.request import Request,urlopen
from bs4 import BeautifulSoup, Comment

accessurl = 'https://www.sports-reference.com/cfb/schools/oklahoma-state/2016/gamelog/'
req = Request(accessurl)
link = urlopen(req)
soup = BeautifulSoup(link, "lxml")

comments=soup.find_all(string=lambda text:isinstance(text,Comment))
for comment in comments:
    comment=BeautifulSoup(str(comment), 'lxml')
    defensive_log = comment.find('table') #search as ordinary tag
    if defensive_log:
        break

你说的“被评论掉”是什么意思？@Storm，有什么反馈吗？我的解决方案有用吗？很抱歉，我们需要很长时间才能联系到您--我一直在搬家，终于又回到了项目上。我现在正在运行它以尝试合并它。它让我可以把这个放到桌子上。我把最后一个代码字符串放在上面的问题中。对不起，我没有完全理解你的意思。你的意思是说你已经做了一个变通办法，但现在正试图以我的方式实现目标？

from urllib.request import Request,urlopen
from bs4 import BeautifulSoup, Comment

accessurl = 'https://www.sports-reference.com/cfb/schools/oklahoma-state/2016/gamelog/'
req = Request(accessurl)
link = urlopen(req)
soup = BeautifulSoup(link, "lxml")

comments=soup.find_all(string=lambda text:isinstance(text,Comment))
for comment in comments:
    comment=BeautifulSoup(str(comment), 'lxml')
    defensive_log = comment.find('table') #search as ordinary tag
    if defensive_log:
        break