Twitter 无法使用urllib2访问登录页面
我试图通过Python中的urllib2访问twitter上的受保护页面(例如我自己的类似列表),但这段代码总是将我发送回登录页面。知道为什么吗 (我知道我可以使用twitter API之类的东西,但我想大致了解一下这是如何做到的) 谢谢, 罗伊Twitter 无法使用urllib2访问登录页面,twitter,web-scraping,beautifulsoup,session-cookies,urllib2,Twitter,Web Scraping,Beautifulsoup,Session Cookies,Urllib2,我试图通过Python中的urllib2访问twitter上的受保护页面(例如我自己的类似列表),但这段代码总是将我发送回登录页面。知道为什么吗 (我知道我可以使用twitter API之类的东西,但我想大致了解一下这是如何做到的) 谢谢, 罗伊 守则: url = "https://twitter.com/login" protectedUrl = "https://twitter.com/username/likes USER = "myTwitterUser" PASS = "myTw
守则:
url = "https://twitter.com/login"
protectedUrl = "https://twitter.com/username/likes
USER = "myTwitterUser"
PASS = "myTwitterPassword"
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders = [('User-Agent', 'Mozilla/5.0'), ("Referer", "https://twitter.com")]
hdr = {'User-Agent': 'Mozilla/5.0', "Referer":"https://twitter.com"}
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req)
html = page.read()
s = BeautifulSoup(html, "lxml")
AUTH_TOKEN = s.find(attrs={"name": "authenticity_token"})["value"]
login_details = {"session[username_or_email]": USER,
"session[password]": PASS,
"remember_me": 1,
"return_to_ssl": "true",
"scribe_log": "",
"redirect_after_login": "/",
"authenticity_token": AUTH_TOKEN
}
login_data = urllib.urlencode(login_details)
opener.open(url, login_data)
resp = opener.open(protectedUrl)
print resp.read()
您需要发布到正确的url,即
“https://twitter.com/sessions“
,当您发出获取=authenticity\u令牌的初始请求时,也必须使用opener
,以便page=opener.open(req)
代替page=urlib2.urlopen(req)
因此我们获得了所需的cookies:
如果我们使用我的一个twitter帐户运行代码,并且没有喜欢的内容:
In [72]: login_details = {"session[username_or_email]": USER,
....: "session[password]": PASS,
....: "remember_me": 1,
....: "redirect_after_login": "/",
....: "authenticity_token": AUTH_TOKEN
....: }
In [73]: # encode form data
In [74]: login_data = urllib.urlencode(login_details)
In [75]: r = opener.open("https://twitter.com/sessions", login_data)
In [76]: # get likes now we have logged in
In [77]: resp = opener.open(likes.format(USER))
In [78]: soup = BeautifulSoup(resp.read(),"lxml")
In [79]: print(soup.select_one("p.empty-text"))
<p class="empty-text">
You haven't liked any Tweets yet.
</p>
根据我对此类网站的经验,您需要使用完整的HTTP头,包括:
- 接受
- 接受编码
- 接受语言
- 推荐人
- 升级不安全的请求
- 用户代理
仅从标头中删除cookie
您还需要创建会话并处理cookies,因为twitter必须像facebook一样。我个人更喜欢使用“请求”,因为您可以轻松创建会话和使用cookie
您可以这样做:
import requests
form time import sleep
hd = {'h11': 'h12', 'h21': 'h22', 'h31': 'h32'}
usrdata = {'user': USER, 'pass': PASS}
sess = requests.Session()
req = sess.get('http://www.twitter.com') ## to start session
sleep(1)
req = sess.post('https://twitter.com/sessions', data=usrdata, headers=hd)
希望这有帮助
USER = "username"
PASS = "pass"
post = "https://twitter.com/sessions"
likes = "https://twitter.com/{}/likes"
url = "https://twitter.com"
data = {"session[username_or_email]": USER,
"session[password]": PASS,
"scribe_log": "",
"redirect_after_login": "/",
"remember_me": "1"}
post = "https://twitter.com/sessions"
with requests.Session() as s:
r = s.get(url)
soup = BeautifulSoup(r.content, "lxml")
AUTH_TOKEN = soup.select_one("input[name=authenticity_token]")["value"]
data["authenticity_token"] = AUTH_TOKEN
r = s.post(post, data=data)
soup = BeautifulSoup(r.content)
print(s.get( "https://twitter.com/{}/likes".format(USER)).content)
import requests
form time import sleep
hd = {'h11': 'h12', 'h21': 'h22', 'h31': 'h32'}
usrdata = {'user': USER, 'pass': PASS}
sess = requests.Session()
req = sess.get('http://www.twitter.com') ## to start session
sleep(1)
req = sess.post('https://twitter.com/sessions', data=usrdata, headers=hd)