Twitter 无法使用urllib2访问登录页面

Twitter 无法使用urllib2访问登录页面,twitter,web-scraping,beautifulsoup,session-cookies,urllib2,Twitter,Web Scraping,Beautifulsoup,Session Cookies,Urllib2,我试图通过Python中的urllib2访问twitter上的受保护页面(例如我自己的类似列表),但这段代码总是将我发送回登录页面。知道为什么吗 (我知道我可以使用twitter API之类的东西,但我想大致了解一下这是如何做到的) 谢谢, 罗伊 守则: url = "https://twitter.com/login" protectedUrl = "https://twitter.com/username/likes USER = "myTwitterUser" PASS = "myTw

我试图通过Python中的urllib2访问twitter上的受保护页面(例如我自己的类似列表),但这段代码总是将我发送回登录页面。知道为什么吗

(我知道我可以使用twitter API之类的东西,但我想大致了解一下这是如何做到的)

谢谢, 罗伊


守则:

url = "https://twitter.com/login"
protectedUrl = "https://twitter.com/username/likes

USER = "myTwitterUser"
PASS = "myTwitterPassword"

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders = [('User-Agent', 'Mozilla/5.0'), ("Referer", "https://twitter.com")]

hdr = {'User-Agent': 'Mozilla/5.0', "Referer":"https://twitter.com"}
req = urllib2.Request(url, headers=hdr)
page = urllib2.urlopen(req)

html = page.read()
s = BeautifulSoup(html, "lxml")
AUTH_TOKEN = s.find(attrs={"name": "authenticity_token"})["value"]

login_details = {"session[username_or_email]": USER,
              "session[password]": PASS,
              "remember_me": 1,
              "return_to_ssl": "true",
              "scribe_log": "",
              "redirect_after_login": "/",
              "authenticity_token": AUTH_TOKEN
                 }

login_data = urllib.urlencode(login_details)
opener.open(url, login_data)
resp = opener.open(protectedUrl)
print resp.read()

您需要发布到正确的url,即
“https://twitter.com/sessions“
,当您发出获取
=authenticity\u令牌的初始请求时,也必须使用
opener
,以便
page=opener.open(req)
代替
page=urlib2.urlopen(req)
因此我们获得了所需的cookies:

如果我们使用我的一个twitter帐户运行代码,并且没有喜欢的内容:

In [72]: login_details = {"session[username_or_email]": USER,
   ....:                  "session[password]": PASS,
   ....:                  "remember_me": 1,
   ....:                  "redirect_after_login": "/",
   ....:                  "authenticity_token": AUTH_TOKEN
   ....:                  }

In [73]: # encode form data

In [74]: login_data = urllib.urlencode(login_details)

In [75]: r = opener.open("https://twitter.com/sessions", login_data)

In [76]: # get likes now we have logged in

In [77]: resp = opener.open(likes.format(USER))

In [78]: soup = BeautifulSoup(resp.read(),"lxml")

In [79]: print(soup.select_one("p.empty-text"))
<p class="empty-text">
        You haven't liked any Tweets yet.

    </p>

根据我对此类网站的经验,您需要使用完整的HTTP头,包括:

  • 接受
  • 接受编码
  • 接受语言
  • 推荐人
  • 升级不安全的请求
  • 用户代理
仅从标头中删除cookie

您还需要创建会话并处理cookies,因为twitter必须像facebook一样。我个人更喜欢使用“请求”,因为您可以轻松创建会话和使用cookie

您可以这样做:

import requests
form time import sleep

hd = {'h11': 'h12',  'h21': 'h22', 'h31': 'h32'}
usrdata = {'user': USER, 'pass': PASS}

sess = requests.Session()
req = sess.get('http://www.twitter.com') ## to start session
sleep(1)
req = sess.post('https://twitter.com/sessions', data=usrdata, headers=hd)
希望这有帮助

USER = "username"
PASS = "pass"
post = "https://twitter.com/sessions"
likes = "https://twitter.com/{}/likes"
url = "https://twitter.com"

data = {"session[username_or_email]": USER,
        "session[password]": PASS,
        "scribe_log": "",
        "redirect_after_login": "/",
        "remember_me": "1"}

post = "https://twitter.com/sessions"

with requests.Session() as s:
    r = s.get(url)
    soup = BeautifulSoup(r.content, "lxml")
    AUTH_TOKEN = soup.select_one("input[name=authenticity_token]")["value"]
    data["authenticity_token"] = AUTH_TOKEN
    r = s.post(post, data=data)
    soup = BeautifulSoup(r.content)
    print(s.get( "https://twitter.com/{}/likes".format(USER)).content)
import requests
form time import sleep

hd = {'h11': 'h12',  'h21': 'h22', 'h31': 'h32'}
usrdata = {'user': USER, 'pass': PASS}

sess = requests.Session()
req = sess.get('http://www.twitter.com') ## to start session
sleep(1)
req = sess.post('https://twitter.com/sessions', data=usrdata, headers=hd)