python beautifulsoup提取标记之间的外观数_Python_Tags_Beautifulsoup_Extract

python beautifulsoup提取标记之间的外观数

python tags

python beautifulsoup提取标记之间的外观数,python,tags,beautifulsoup,extract,Python,Tags,Beautifulsoup,Extract,我想提取网页中标签之间的“file it”编号。这是我的密码 from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen("https://www.crummy.com/software/BeautifulSoup/") bsObj = BeautifulSoup(html, "html.parser") nameList = bsObj.findAll(text="file it") pr

我想提取网页中标签之间的“file it”编号。这是我的密码

from urllib.request import urlopen
from bs4 import BeautifulSoup

html = urlopen("https://www.crummy.com/software/BeautifulSoup/")
bsObj = BeautifulSoup(html, "html.parser")

nameList = bsObj.findAll(text="file it")
print(len(nameList))

在“归档”或“下载”的情况下，它与结果1很好地工作。在“名人堂”的例子中，它与结果2很好地工作

但对于“讨论组”，它应该是2，但不起作用，结果是0

为什么在“讨论组”案例或“获取源代码”案例中得到结果0

在正则表达式中使用

\s+

匹配所有空白，包括

\n

如果查看页面源代码，则在

和“讨论组”

之间有一条换行符。

import re
nameList = bsObj.findAll(text=re.compile(r"the\s+discussion\sgroup"))