使用BeautifulSoup在python中提取链接标记之间的文本_Python_Text_Tags_Extract_Beautifulsoup

使用BeautifulSoup在python中提取链接标记之间的文本

python text tags

使用BeautifulSoup在python中提取链接标记之间的文本,python,text,tags,extract,beautifulsoup,Python,Text,Tags,Extract,Beautifulsoup,我有如下html代码： import BeautifulSoup html = """ <html><head></head> <body> <h2 class='title'><a href='http://www.gurletins.com'>My HomePage</a></h2> <h2 class='title'><a href='http://www.gurletins

我有如下html代码：

import BeautifulSoup

html = """
<html><head></head>
<body>
<h2 class='title'><a href='http://www.gurletins.com'>My HomePage</a></h2>
<h2 class='title'><a href='http://www.gurletins.com/sections'>Sections</a></h2>
</body>
</html>
"""

soup = BeautifulSoup.BeautifulSoup(html)

print [elm.a.text for elm in soup.findAll('h2', {'class': 'title'})]
# Output: [u'My HomePage', u'Sections']

我需要提取文本（链接说明）之间的'a'标签。我需要一个数组来存储以下内容：

a[0]=“我的主页”

a[1]=“部分”

我需要在python中使用BeautifulSoup来实现这一点

请帮帮我，谢谢

您可以这样做：

import BeautifulSoup

html = """
<html><head></head>
<body>
<h2 class='title'><a href='http://www.gurletins.com'>My HomePage</a></h2>
<h2 class='title'><a href='http://www.gurletins.com/sections'>Sections</a></h2>
</body>
</html>
"""

soup = BeautifulSoup.BeautifulSoup(html)

print [elm.a.text for elm in soup.findAll('h2', {'class': 'title'})]
# Output: [u'My HomePage', u'Sections']

导入美化组
html=”“”
"""
soup=BeautifulSoup.BeautifulSoup（html）
打印[elm.a.text for elm in soup.findAll（'h2'，{'class'：'title'}）]
#输出：[u'我的主页'，u'部分']

打印[a.findAll（text=True）表示汤中的a.findAll（'a'）]

以下代码提取“a”标记之间的文本（链接描述）并存储在数组中

>>> from bs4 import BeautifulSoup
>>> data = """<h2 class="title"><a href="http://www.gurletins.com">My 
HomePage</a></h2>
...
... <h2 class="title"><a href="http://www.gurletins.com/sections">Sections</a>
</h2>"""
>>> soup = BeautifulSoup(data, "html.parser")
>>> reqTxt = soup.find_all("h2", {"class":"title"})
>>> a = []
>>> for i in reqTxt:
...     a.append(i.get_text())
...
>>> a
['My HomePage', 'Sections']
>>> a[0]
'My HomePage'
>>> a[1]
'Sections'

>>来自bs4导入组
>>>data=”“”
...
... 
"""
>>>soup=BeautifulSoup（数据，“html.parser”）
>>>reqTxt=soup.find_all（“h2”，“class”：“title”}）
>>>a=[]
>>>对于reqTxt中的i：
...     a、 追加（i.get_text（））
...
>>>a
[‘我的主页’、‘章节’]
>>>a[0]
“我的主页”
>>>a[1]
“部分”

@Mehmet Helvaci:你能更具体地说明什么不起作用吗？你是不是有什么错误或者，上面我发布的代码对我来说是正确的。