Python 使用bs4查找特定的链接文本

Python 使用bs4查找特定的链接文本,python,html,web-scraping,beautifulsoup,Python,Html,Web Scraping,Beautifulsoup,我正试图抓取一个网站,找到一个提要的所有标题。我在获取所需的标记的文本时遇到问题。下面是一个html示例 <td class="m" id="b1"><a href="/QSYcfT" id="c1" target="_blank" onClick="vPI('https://www.youtube.com/watch?v=BFNH-6K10Ic', 'QSYcfT', this.id); this.blur(); return false;">TF4 - Oreos&l

我正试图抓取一个网站,找到一个提要的所有标题。我在获取所需的
标记的文本时遇到问题。下面是一个html示例

<td class="m" id="b1"><a href="/QSYcfT" id="c1" target="_blank" onClick="vPI('https://www.youtube.com/watch?v=BFNH-6K10Ic', 'QSYcfT', this.id); this.blur(); return false;">TF4 - Oreos</a> <a href="#" onClick="return lkP('1', 'QSYcfT');" id="x1"><font class="bp">(0)</font></a>
<td class="m" id="b2"><a href="/zXHNvp" id="c2" target="_blank" onClick="vPI('https://www.youtube.com/watch?v=0vjcGwZGBYI', 'zXHNvp', this.id); this.blur(); return false;">Awesome Game Boy Facts</a> <a href="#" onClick="return lkP('2', 'zXHNvp');" id="x2"><font class="bp">(0)</font></a>
到目前为止我已经试过了

soup = bs4.BeautifulSoup(html)
links = soup.find_all('a',{'id' : 'c'})
for link in links:
    print link.text

但是它没有找到或打印任何东西?

没有属性为
c
a
标记,而是
c1
c2

links = soup.find_all('a',{'id' : 'c1'})
如果要查找属性以
c
开头的所有
a
,需要传递正则表达式:

import re

links = soup.findAll('a', {'id': re.compile('^c')})
可以使用以下内容代替属性值:

links = soup.find_all('a', {'id': re.compile('^c\d+')})
^
表示字符串的开头,
\d+
匹配一个或多个数字

演示:

>>重新导入
>>>从bs4导入BeautifulSoup
>>> 
>>>html=”“”
... 
...      
...      
... 
... """
>>>soup=BeautifulSoup(html)
>>>links=soup.find_all('a',{'id':重新编译('^c\d+)})
>>>对于链接中的链接:
...     打印link.text
... 
TF4-奥利奥
真棒的游戏男孩事实
您可以将调用中的对象传递给


如果可以的话,我会接受所有这些有效的答案和回答。
links = soup.find_all('a', {'id': re.compile('^c\d+')})
>>> import re
>>> from bs4 import BeautifulSoup
>>> 
>>> html = """
... <tr>
...     <td class="m" id="b1"><a href="/QSYcfT" id="c1" target="_blank" onClick="vPI('https://www.youtube.com/watch?v=BFNH-6K10Ic', 'QSYcfT', this.id); this.blur(); return false;">TF4 - Oreos</a> <a href="#" onClick="return lkP('1', 'QSYcfT');" id="x1"><font class="bp">(0)</font></a></td>
...     <td class="m" id="b2"><a href="/zXHNvp" id="c2" target="_blank" onClick="vPI('https://www.youtube.com/watch?v=0vjcGwZGBYI', 'zXHNvp', this.id); this.blur(); return false;">Awesome Game Boy Facts</a> <a href="#" onClick="return lkP('2', 'zXHNvp');" id="x2"><font class="bp">(0)</font></a></td>
... </tr>
... """
>>> soup = BeautifulSoup(html)
>>> links = soup.find_all('a', {'id': re.compile('^c\d+')})
>>> for link in links:
...     print link.text
... 
TF4 - Oreos
Awesome Game Boy Facts
import re
import bs4

html = '''
<td class="m" id="b1"><a href="/QSYcfT" id="c1" target="_blank" onClick="vPI('https://www.youtube.com/watch?v=BFNH-6K10Ic', 'QSYcfT', this.id); this.blur(); return false;">TF4 - Oreos</a> <a href="#" onClick="return lkP('1', 'QSYcfT');" id="x1"><font class="bp">(0)</font></a>
<td class="m" id="b2"><a href="/zXHNvp" id="c2" target="_blank" onClick="vPI('https://www.youtube.com/watch?v=0vjcGwZGBYI', 'zXHNvp', this.id); this.blur(); return false;">Awesome Game Boy Facts</a> <a href="#" onClick="return lkP('2', 'zXHNvp');" id="x2"><font class="bp">(0)</font></a>
'''

soup = bs4.BeautifulSoup(html)
for links in soup.find_all('a', {'id' : re.compile('^c') }):
    print ''.join(links.find_all(text=True))
TF4 - Oreos
Awesome Game Boy Facts