Python 刮刀罐'；无法使用代理获取最正确的页面_Python_Proxy_Beautifulsoup

Python 刮刀罐'；无法使用代理获取最正确的页面

python proxy

Python 刮刀罐'；无法使用代理获取最正确的页面,python,proxy,beautifulsoup,Python,Proxy,Beautifulsoup,我正在从中提取数据。我来自俄罗斯，当我使用标准IP访问url时，页面显示错误，没有数据。但是当我使用英国代理时，它是可以的这就是为什么我必须使用代理，而刮，但我得到一个奇怪的问题。当我尝试通过浏览器进入时，它可以工作（它包含数据）。但当我使用脚本时，它是以另一种方式表示的由于某些原因，我的解析器并不表示来自的页面，因为我可以通过浏览器看到它们例如，差异开始的html代码：通过浏览器（根据需要）： Edited1: 一个有趣的人认为我认为问题就在这里。当我在Mozilla中使用相同的代理

我正在从中提取数据。我来自俄罗斯，当我使用标准IP访问url时，页面显示错误，没有数据。但是当我使用英国代理时，它是可以的

这就是为什么我必须使用代理，而刮，但我得到一个奇怪的问题。当我尝试通过浏览器进入时，它可以工作（它包含数据）。但当我使用脚本时，它是以另一种方式表示的

由于某些原因，我的解析器并不表示来自的页面，因为我可以通过浏览器看到它们

例如，差异开始的html代码：

通过浏览器（根据需要）：

Edited1:

一个有趣的人认为我认为问题就在这里。当我在Mozilla中使用相同的代理时，我只能看到20个页面，但是使用Chrome-40

Edited2:

问题已经解决了。看来我必须注册并登录才能查看完整信息。

页面使用javascript呈现，请使用无头浏览器获取javascript呈现的页面。或者使用底层的ajax API来获取json/xml响应LAN，不幸的是，我还没有理解您的意思。你能告诉我如何将它集成到我的代码中吗？读这些：，艾伦，你似乎不明白我问了什么。问题不在js中。

<div id="pagination">Page:<a class="instl confirm-nav previous" rel="nofollow" href="?q=data+scientist&amp;l=london&amp;co=GB&amp;start=900">« Previous</a><a class="instl confirm-nav" rel="nofollow" href="?q=data+scientist&amp;l=london&amp;co=GB&amp;start=850">18</a><a class="instl confirm-nav" rel="nofollow" href="?q=data+scientist&amp;l=london&amp;co=GB&amp;start=900">19</a><span class="current_page">20</span><a class="instl confirm-nav" rel="nofollow" href="?q=data+scientist&amp;l=london&amp;co=GB&amp;start=1000">21</a><a class="instl confirm-nav" rel="nofollow" href="?q=data+scientist&amp;l=london&amp;co=GB&amp;start=1050">22</a><a class="instl confirm-nav" rel="nofollow" href="?q=data+scientist&amp;l=london&amp;co=GB&amp;start=1100">23</a><a class="instl confirm-nav" rel="nofollow" href="?q=data+scientist&amp;l=london&amp;co=GB&amp;start=1150">24</a><a class="instl confirm-nav next" rel="nofollow" href="?q=data+scientist&amp;l=london&amp;co=GB&amp;start=1000">Next »</a></div><div id="footer" class=""><p id="footer_nav" class="footer_nav">

</div><div id="pagination">Page:<a class="instl confi
rm-nav previous" href="?q=data+scientist&amp;l=london&amp;co=GB&amp;start=900" rel="nofollow">< Previous</a><a class="in
stl confirm-nav" href="?q=data+scientist&amp;l=london&amp;co=GB&amp;start=850" rel="nofollow">18</a><a class="instl conf
irm-nav" href="?q=data+scientist&amp;l=london&amp;co=GB&amp;start=900" rel="nofollow">19</a><span class="current_page">2
0</span></div><div class="" id="footer"><p class="footer_nav" id="footer_nav">

from bs4 import BeautifulSoup
import requests

proxy = {"http": "http://134.213.145.228:8080"}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
page_url = 'http://www.indeed.com/resumes/data-scientist/in-london?co=GB&start=950'
req = requests.get(page_url, proxies=proxy, headers=headers)
req.encoding = 'utf-8'
main = BeautifulSoup(req.text, 'html.parser')
profile_urls_tag = main.find_all('a', class_="app_link")