使用python请求进行etrade抓取不会'；我不想使用跨域URL_Python_Cookies_Cross Domain_Python Requests

使用python请求进行etrade抓取不会'；我不想使用跨域URL

python cookies

使用python请求进行etrade抓取不会'；我不想使用跨域URL,python,cookies,cross-domain,python-requests,Python,Cookies,Cross Domain,Python Requests,尝试从etrade中获取一些基本的股票信息（我知道他们有一个api，但我想先弄清楚这一点），我可以使用以下命令通过请求模块登录： import requests from bs4 import BeautifulSoup, Comment symbol = 'A' payload = {'USER':etradeUsername, 'PASSWORD':etradePassword, 'countrylangselect':'us_english', 'TARGET':'/e/t/pfm/por

尝试从etrade中获取一些基本的股票信息（我知道他们有一个api，但我想先弄清楚这一点），我可以使用以下命令通过请求模块登录：

import requests
from bs4 import BeautifulSoup, Comment
symbol = 'A'
payload = {'USER':etradeUsername, 'PASSWORD':etradePassword, 'countrylangselect':'us_english', 'TARGET':'/e/t/pfm/portfolioview'}
with requests.Session() as c:
    c.post('https://us.etrade.com/login.fcc', data=payload)
    r=c.get('https://us.etrade.com/e/t/pfm/portfolioview')
    #r=c.get('https://www.etrade.wallst.com/v1/stocks/snapshot/snapshot.asp?symbol=' + symbol + '&rsO=new')

    etradeMarkup = BeautifulSoup(r.text)
    #print r.headers
    file1 = open("etrade.html","w")
    file1.write("<html><body><head><meta charset='UTF-8'></head>" + str(etradeMarkup.prettify().encode("utf-8")) + "</body></html>")
    file1.flush()
    file1.close()

导入请求
从bs4导入BeautifulSoup，评论
符号='A'
有效负载={'USER'：etradeUsername，'PASSWORD'：etradePassword，'countrylangselect'：'us_english'，'TARGET'：'/e/t/pfm/portfolioview'}
将requests.Session（）作为c：
c、 邮政（'https://us.etrade.com/login.fcc，数据=有效载荷）
r=c.get（）https://us.etrade.com/e/t/pfm/portfolioview')
#r=c.get（）https://www.etrade.wallst.com/v1/stocks/snapshot/snapshot.asp?symbol=“+symbol+”&rsO=new'）
etradeMarkup=BeautifulSoup（r.text）
#打印右标题
file1=open（“etrade.html”、“w”）
file1.write（“+str（etradeMarkup.prettify（）.encode（“utf-8”））+”）
file1.flush（）
file1.close（）

文件写入是为了让我看看刮刀得到了什么

我可以很好地看到公文包页面，因此我知道登录正在工作。注释掉的下一行是我的目标页面。我可以看到www.etrade.wallst.com。。。使用我的浏览器成功登录后的页面，但scraper只是被重定向到etrade.com登录页面

with requests.Session() as c:

    #  adding this line was the key
    c.get('https://us.etrade.com/e/t/invest/markets?ploc=c-MainNav') 

    r=c.get('https://www.etrade.wallst.com/v1/stocks/snapshot/snapshot.asp?symbol=' + symbol + '&rsO=new')

    etradeMarkup = BeautifulSoup(r.text)

我认为有一个会话传输或cookie变量在域之间移动，我的浏览器知道如何处理，但我的代码不知道

我的python和http知识已经走到了死胡同，我希望有人能给我指出正确的方向，让我知道如何编程克服这个困难

非常感谢您能提供的任何帮助。

（python和scraping的新手，所以请耐心点：）

我发现还有一个页面需要设置cookies。我假设推送到etrade登录页面是因为需要来自etrade登录后部分的cookie，但我错了。我根本不需要这个页面的etrade登录，只需要另一个页面来获取cookies。通过添加查看行，我能够获得查看目标页面所需的数据，而不会迫使我的程序返回登录页面

with requests.Session() as c:

    #  adding this line was the key
    c.get('https://us.etrade.com/e/t/invest/markets?ploc=c-MainNav') 

    r=c.get('https://www.etrade.wallst.com/v1/stocks/snapshot/snapshot.asp?symbol=' + symbol + '&rsO=new')

    etradeMarkup = BeautifulSoup(r.text)

有一个合理的机会，它不喜欢你自我识别为一个机器人。查看伪造

用户代理

标题是否有任何作用。@roippi我通过添加行c.headers.update（{'User-Agent'：'Mozilla/5.0（X11；Linux x86_64）AppleWebKit/537.36（KHTML，比如Gecko）Chrome/37.0.2062.94 Safari/537.36'，'Referer'：''）尝试了几个不同的标题，但没有任何改变。然后我尝试了一个IE10用户代理：c.headers.update（{'User-Agent'：'Mozilla/5.0（兼容；MSIE 10.6；Windows NT 6.1；Trident/5.0；InfoPath.2；SLCC1；.NET CLR 3.0.4506.2152；.NET CLR 3.5.30729；.NET CLR 2.0.50727）3gpp gba UNTRUSTED/1.0'），这也没有任何帮助。还有其他建议吗？