Python Tor:无法满足请求/请求被阻止

Python Tor:无法满足请求/请求被阻止,python,web-scraping,beautifulsoup,python-requests,tor,Python,Web Scraping,Beautifulsoup,Python Requests,Tor,我试图使用Tor从下面的链接发出请求,但它返回错误。在没有Tor的情况下发出请求很好,但我仍然需要它们在Tor中,或者是随机IP中 我这样做对吗?或者有更好的解决办法 link = 'https://www.totallylegal.com/searchjobs/' import requests torport = 9050 headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/

我试图使用Tor从下面的链接发出请求,但它返回错误。在没有Tor的情况下发出请求很好,但我仍然需要它们在Tor中,或者是随机IP中

我这样做对吗?或者有更好的解决办法

link = 'https://www.totallylegal.com/searchjobs/'
import requests
torport = 9050
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36',
    'accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
}
proxies = {
    'http': "socks5h://localhost:{}".format(torport),
    'https': "socks5h://localhost:{}".format(torport)
}

print(requests.get(link,headers=headers, proxies=proxies).content)
下面是显示的错误:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>403 ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
Request blocked.

<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront)
Request ID: iXaDPfPtyHg0TGTFJvYuAnV86unJIpBITxdBJ2w_i_bo-ToR510p2w==
</PRE>
<ADDRESS>
</ADDRESS>
</BODY></HTML>

错误:无法满足请求
403错误
这项要求未能得到满足。

请求被阻止。

由cloudfront生成(cloudfront) 请求ID:iXaDPfPtyHg0TGTFJvYuAnV86unJIpBITxdBJ2w_i_bo-ToR510p2w==
该页面似乎是针对Ip地址的封锁列表,因此我们可以通过另一个网站(如W3 validator)来规避这一问题,该网站向我们显示源代码:

我们仍在使用TOR,但允许其他站点为我们获取该站点(并且他们的IP未被阻止):


该页面似乎封锁了Ip地址,因此我们可以通过另一个网站(如W3 validator)绕过这一点,该网站向我们显示了源代码:

我们仍在使用TOR,但允许其他站点为我们获取该站点(并且他们的IP未被阻止):


如果你需要更多信息,请告诉我。我会随时待命。如果你需要更多信息,请告诉我。我会待命的。非常感谢你,安德烈!这当然是另一种选择。非常感谢安德烈!这当然是另一种选择。
from bs4 import BeautifulSoup
import requests

proxies = {
    'http': 'http://<YOUR PROXY ADDRESS>:<YOUR PROXY PORT>',
    'https': 'http://<YOUR PROXY ADDRESS>:<YOUR PROXY PORT>',
}

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36',
    'accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
}

r = requests.get('https://validator.w3.org/nu/?showsource=yes&doc=https%3A%2F%2Fwww.totallylegal.com%2Fsearchjobs%2F', proxies=proxies, headers=headers)
soup = BeautifulSoup(r.text, 'lxml')
source_code = ''
for code in soup.select('ol.source > li > code'):
    if 'class' in code.attrs and 'lf' in code.attrs['class']:
        source_code += '\n'
    else:
        source_code += code.text

soup2 = BeautifulSoup(source_code, 'lxml')

for li in soup2.select('li.lister__item h3'):
    print(li.text)
    print('-' * 80)
Corporate Partner
--------------------------------------------------------------------------------
Personal Injury Paralegal
--------------------------------------------------------------------------------
Healthcare Regulatory Lawyer - London
--------------------------------------------------------------------------------
Company Secretary and Corporate Governance
--------------------------------------------------------------------------------
Junior FCPA/Compliance Associate, Beijing - 14612/TTL
--------------------------------------------------------------------------------
International Project Manager, Shanghai - 14611/TTL
--------------------------------------------------------------------------------
Corporate Associate (4+ PQE) Beijing - 14610/TTL
--------------------------------------------------------------------------------
Corporate Associate (5+ PQE) Shanghai - 14609/TTL
--------------------------------------------------------------------------------
Corporate or Commercial Counsel -Pharma- Surrey
--------------------------------------------------------------------------------
Corporate/Public M&A PSL, 5+ PQE
--------------------------------------------------------------------------------
Solicitor
--------------------------------------------------------------------------------
In-house Legal Counsel - Excellent opportunity to go In-House!
--------------------------------------------------------------------------------
Real Estate Partner
--------------------------------------------------------------------------------
Child Brain Injury Solicitor
--------------------------------------------------------------------------------
Corporate/Commercial In-House Lawyer, 1+
--------------------------------------------------------------------------------
In-house Regulatory Counsel, Banking/Payments, 5+
--------------------------------------------------------------------------------
In-house Property Finance/Banking Lawyer, 1-3
--------------------------------------------------------------------------------
Hybrid Legal & Compliance Data Protection Manager
--------------------------------------------------------------------------------
Hedge Fund Legal Counsel 3-5 years PQE
--------------------------------------------------------------------------------
Corporate PSL
--------------------------------------------------------------------------------