Python 2.7 遇到-“;raise HTTPError(请求获取完整url(),代码,消息,hdrs,fp)urllib2.HTTPError:HTTP错误403:禁止";

Python 2.7 遇到-“;raise HTTPError(请求获取完整url(),代码,消息,hdrs,fp)urllib2.HTTPError:HTTP错误403:禁止";,python-2.7,web-scraping,beautifulsoup,urllib2,Python 2.7,Web Scraping,Beautifulsoup,Urllib2,该网站显然屏蔽了机器人和机器人,因此你必须添加chrome/Mozzila标题才能像浏览器一样工作。请尝试下面的代码 import urllib2 import BeautifulSoup request = urllib2.Request("https://adexchanger.com/searchresults/?q=digital%20marketing") response = urllib2.urlopen(request) soup = BeautifulSoup.Beau

该网站显然屏蔽了机器人和机器人,因此你必须添加chrome/Mozzila标题才能像浏览器一样工作。请尝试下面的代码

import urllib2

import BeautifulSoup

request = urllib2.Request("https://adexchanger.com/searchresults/?q=digital%20marketing")

response = urllib2.urlopen(request)

soup = BeautifulSoup.BeautifulSoup(response)

for a in soup.findAll('a'):

  if 'digital marketing' in a['href']:

    print a
>>headers={'User-Agent':'Mozilla/5.0(麦金塔;英特尔Mac OS X 10_10_1)AppleWebKit/537.36(KHTML,像Gecko)Chrome/39.0.2171.95 Safari/537.36'}
>>>req=urllib2。请求('https://adexchanger.com/searchresults/?q=digital%20marketing,无,标题)
>>>urllib2.urlopen(请求)

错误消息表示您无法访问此网站。我正在尝试从此网站提取链接(包含数字营销一词)。有解决此问题的其他方法吗?尽管可以通过浏览器访问,但请尝试检查标题。这是输出键错误:“href”您试图访问的href属性不存在。我建议在浏览器中打开网页并检查元素,然后尝试使用正确的属性进行刮取。您现在没有收到403错误,对吗?我没有收到403错误。应检查好元件。谢谢
>>> headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
>>> req = urllib2.Request('https://adexchanger.com/searchresults/?q=digital%20marketing', None, headers)
>>> urllib2.urlopen(req)
<addinfourl at 140245639765816 whose fp = <socket._fileobject object at 0x7f8d7b865250>>