Python BS4：在标记中获取文本_Python_Html_Parsing_Html Parsing_Beautifulsoup

Python BS4：在标记中获取文本

python html parsing

Python BS4：在标记中获取文本,python,html,parsing,html-parsing,beautifulsoup,Python,Html,Parsing,Html Parsing,Beautifulsoup,我在用漂亮的汤。有这样一个标签：我只想获取锚标记中的文本，而不从输出中的标记中获取任何文本；i、 e.“s.r.o.，” 我尝试了find（'li'）。文本[0]，但不起作用 BS4中是否有可以执行此操作的命令使用一个选项是从a元素的 >>> from bs4 import BeautifulSoup >>> data = '<li><a href="example"> s.r.o., <small>small<

我在用漂亮的汤。有这样一个标签：

我只想获取锚

标记中的文本，而不从输出中的

标记中获取任何文本；i、 e.“

s.r.o.，

”

我尝试了

find（'li'）。文本[0]

，但不起作用

BS4中是否有可以执行此操作的命令

使用

一个选项是从

元素的

>>> from bs4 import BeautifulSoup
>>> data = '<li><a href="example"> s.r.o., <small>small</small></a></li>'
>>> soup = BeautifulSoup(data)
>>> print soup.find('a').contents[0]
 s.r.o.,

嗯，还有各种各样的替代/疯狂的选择：

>>> print next(soup.find('a').descendants)
 s.r.o., 
>>> print next(iter(soup.find('a')))
 s.r.o.,

如果您希望循环打印html字符串/网页中锚定标记的所有内容（必须使用urllib中的urlopen），这将起到以下作用：

from bs4 import BeautifulSoup
data = '<li><a href="example">s.r.o., <small>small</small</a></li> <li><a href="example">2nd</a></li> <li><a href="example">3rd</a></li>'
soup = BeautifulSoup(data,'html.parser')
a_tag=soup('a')
for tag in a_tag:
    print(tag.contents[0])     #.contents method to locate text within <a> tags

a_标签

是包含所有锚定标签的列表；收集列表中的所有定位标记，启用组编辑（如果有多个

，]

从文档中，可以通过调用string属性来检索标记的文本

soup = BeautifulSoup('<li><a href="example"> s.r.o., <small>small</small></a></li>')
res = soup.find('a')
res.small.decompose()
print(res.string)
# s.r.o.,

soup=BeautifulSoup（“”）
res=soup.find（'a'）
res.small.decompose（）
打印（res.string）
#s.r.o。，

谢谢，但据我所知，没有参数的split（）使用“”作为分隔符，这在本例中非常有用，但有时文本中包含空格和逗号，因此无法使用。或者我错了吗？你是对的，我稍后回到comp时会看一看

from bs4 import BeautifulSoup
data = '<li><a href="example">s.r.o., <small>small</small</a></li> <li><a href="example">2nd</a></li> <li><a href="example">3rd</a></li>'
soup = BeautifulSoup(data,'html.parser')
a_tag=soup('a')
for tag in a_tag:
    print(tag.contents[0])     #.contents method to locate text within <a> tags

s.r.o.,  
2nd
3rd

>>>print(a_tag)
[<a href="example">s.r.o.,  <small>small</small></a>, <a href="example">2nd</a>, <a href="example">3rd</a>]

soup = BeautifulSoup('<li><a href="example"> s.r.o., <small>small</small></a></li>')
res = soup.find('a')
res.small.decompose()
print(res.string)
# s.r.o.,