Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/294.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/79.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 我正试图从一个需要登录但没有得到任何数据的网站上抓取HTML_Python_Html_Python Requests_Lxml_Scraperwiki - Fatal编程技术网

Python 我正试图从一个需要登录但没有得到任何数据的网站上抓取HTML

Python 我正试图从一个需要登录但没有得到任何数据的网站上抓取HTML,python,html,python-requests,lxml,scraperwiki,Python,Html,Python Requests,Lxml,Scraperwiki,但是,当我运行python时,似乎无法获取任何数据。我得到一个HTTP状态码200和状态。ok返回一个真值。任何帮助都会很好。这是我在终端中的响应: [] 200 True import requests from lxml import html USERNAME = "username@email.com" PASSWORD = "legitpassword" LOGIN_URL = "https://bitbucket.org/account/signin/?next=/" UR

但是,当我运行python时,似乎无法获取任何数据。我得到一个HTTP状态码200和
状态。ok
返回一个真值。任何帮助都会很好。这是我在终端中的响应:

[]

200

True

import requests
from lxml import html

USERNAME = "username@email.com"
PASSWORD = "legitpassword"

LOGIN_URL = "https://bitbucket.org/account/signin/?next=/"
URL = "https://bitbucket.org/dashboard/overview"

def main():
session_requests = requests.session()

# Get login csrf token
result = session_requests.get(LOGIN_URL)
tree = html.fromstring(result.text)
authenticity_token = list(set(tree.xpath("//input[@name='csrfmiddlewaretoken']/@value")))[0]

# Create payload
payload = {
    "username": USERNAME, 
    "password": PASSWORD, 
    "csrfmiddlewaretoken": authenticity_token
}

# Perform login
result = session_requests.post(LOGIN_URL, data = payload, headers = dict(referer = LOGIN_URL))

# Scrape url
result = session_requests.get(URL, headers = dict(referer = URL))
tree = html.fromstring(result.content)
bucket_elems = tree.findall(".//span[@class='repo-name']")
bucket_names = [bucket_elem.text_content().replace("\n", "").strip() for bucket_elem in bucket_elems]

print bucket_names
print result.status_code

if __name__ == '__main__':
main()

xpath错误,类repo名称没有span,您可以通过以下方式从锚定标记获取repo名称:

bucket_elems = tree.xpath("//a[@class='execute repo-list--repo-name']")
bucket_names = [bucket_elem.text_content().strip() for bucket_elem in bucket_elems]
自编写教程以来,html已经发生了明显的变化