Python 2.7 如何提取与模式匹配的URL_Python 2.7_Web Scraping_Beautifulsoup_Python Requests

Python 2.7 如何提取与模式匹配的URL

python-2.7 web-scraping

Python 2.7 如何提取与模式匹配的URL,python-2.7,web-scraping,beautifulsoup,python-requests,Python 2.7,Web Scraping,Beautifulsoup,Python Requests,我正在尝试使用以下模式从网页中提取URL： “-.html” 我当前的代码提取所有链接。我怎样才能将代码更改为只提取与模式匹配的URL？谢谢大家! import requests from bs4 import BeautifulSoup def find_governor_races(html): url = html base_url = 'http://www.realclearpolitics.com/' page = requests.get(html).te

我正在尝试使用以下模式从网页中提取URL：

“-.html”

我当前的代码提取所有链接。我怎样才能将代码更改为只提取与模式匹配的URL？谢谢大家!

import requests
from bs4 import BeautifulSoup

def find_governor_races(html):
    url = html
    base_url = 'http://www.realclearpolitics.com/'
    page = requests.get(html).text
    soup = BeautifulSoup(page,'html.parser')  
    links = []
    for a in soup.findAll('a', href=True):
            links.append(a['href'])
find_governor_races('http://www.realclearpolitics.com/epolls/2010/governor/2010_elections_governor_map.html')

您可以为

.find_all（）

提供一个作为

href

参数值的：

import re

pattern = re.compile(r"http://www.realclearpolitics.com\/epolls/\d+/governor/.*?/.*?.html")
links = soup.find_all("a", href=pattern)