Python 网站刮刀不会刮我的链接之一

Python 网站刮刀不会刮我的链接之一,python,web-scraping,Python,Web Scraping,我可以刮一个网站容易,但其他我得到错误???我不确定这是否是因为网站有某种阻碍或其他原因 import random from bs4 import BeautifulSoup import urllib2 import re from urlparse import urljoin user_input = raw_input ("Search for Team = "); resp = urllib2.urlopen("http://idimsports.eu/football.

我可以刮一个网站容易,但其他我得到错误???我不确定这是否是因为网站有某种阻碍或其他原因

import random
from bs4 import BeautifulSoup
import urllib2
import re
from urlparse import urljoin

user_input   = raw_input ("Search for Team = "); 


resp = urllib2.urlopen("http://idimsports.eu/football.html") ###working
soup = BeautifulSoup(resp, from_encoding=resp.info().getparam('charset'))

base_url = "http://idimsports.eu"
links = soup.find_all('a', href=re.compile(''+user_input))
if len(links) == 0:
    print "No Streams Available"
else:
    for link in links: 
        print urljoin(base_url, link['href'])

resp = urllib2.urlopen("http://cricfree.tv/football-live-stream") ###not working
soup = BeautifulSoup(resp, from_encoding=resp.info().getparam('charset'))

links = soup.find_all('a', href=re.compile(''+user_input))
if len(links) == 0:
    print "No Streams Available"
else:
    for link in links: 
        print urljoin(base_url, link['href'])

设置请求的用户代理标头

headers = { 'User-Agent' : 'Mozilla/5.0' }
req = urllib2.Request("http://cricfree.tv/football-live-stream", None, headers)
resp = urllib2.urlopen(req)

另外,在第二个循环中,您正在重用
base\u url
,您可能不想这样做。

您遇到了什么错误,它抛出了哪一行?您可能遇到了
urllib2.HTTPError:HTTP error 403:banbidded
(我是)