在python的HTTP响应中接收正文内容
我试图从正文中获取内容,但当我需要sock.recv时,我总是返回0字节。我已经得到了标题,它工作得很好,但我收到了它的字节。我现在的问题是:我有内容长度,标题的长度,还有标题。现在我想把尸体分开 任务3d PS:我知道它不能工作,因为它是在屏幕截图上,但我还没有找到另一个解决方案在python的HTTP响应中接收正文内容,python,sockets,http,recv,Python,Sockets,Http,Recv,我试图从正文中获取内容,但当我需要sock.recv时,我总是返回0字节。我已经得到了标题,它工作得很好,但我收到了它的字节。我现在的问题是:我有内容长度,标题的长度,还有标题。现在我想把尸体分开 任务3d PS:我知道它不能工作,因为它是在屏幕截图上,但我还没有找到另一个解决方案 # -*- coding: utf-8 -*- """ task3.simple_web_browser XX-YYY-ZZZ <Your name> """ from socket import
# -*- coding: utf-8 -*-
"""
task3.simple_web_browser
XX-YYY-ZZZ
<Your name>
"""
from socket import gethostbyname, socket, timeout, AF_INET, SOCK_STREAM
from sys import argv
HTTP_HEADER_DELIMITER = b'\r\n\r\n'
CONTENT_LENGTH_FIELD = b'Content-Length:'
HTTP_PORT = 80
ONE_BYTE_LENGTH = 1
def create_http_request(host, path, method='GET'):
'''
Create a sequence of bytes representing an HTTP/1.1 request of the given method.
:param host: the string contains the hostname of the remote server
:param path: the string contains the path to the document to retrieve
:param method: the string contains the HTTP request method (e.g., 'GET', 'HEAD', etc...)
:return: a bytes object contains the HTTP request to send to the remote server
e.g.,) An HTTP/1.1 GET request to http://compass.unisg.ch/
host: compass.unisg.ch
path: /
return: b'GET / HTTP/1.1\nHost: compass.unisg.ch\r\n\r\n'
'''
### Task 3(a) ###
# Hint 1: see RFC7230-7231 for the HTTP/1.1 syntax and semantics specification
# https://tools.ietf.org/html/rfc7230
# https://tools.ietf.org/html/rfc7231
# Hint 2: use str.encode() to create an encoded version of the string as a bytes object
# https://docs.python.org/3/library/stdtypes.html#str.encode
r = '{} {} HTTP/1.1\nHost: {}\r\n\r\n'.format(method, path, host)
response = r.encode()
return response
### Task 3(a) END ###
def get_content_length(header):
'''
Get the integer value from the Content-Length HTTP header field if it
is found in the given sequence of bytes. Otherwise returns 0.
:param header: the bytes object contains the HTTP header
:return: an integer value of the Content-Length, 0 if not found
'''
### Task 3(c) ###
# Hint: use CONTENT_LENGTH_FIELD to find the value
# Note that the Content-Length field may not be always at the end of the header.
for line in header.split(b'\r\n'):
if CONTENT_LENGTH_FIELD in line:
return int(line[len(CONTENT_LENGTH_FIELD):])
return 0
### Task 3(c) END ###
def receive_body(sock, content_length):
'''
Receive the body content in the HTTP response
:param sock: the TCP socket connected to the remote server
:param content_length: the size of the content to recieve
:return: a bytes object contains the remaining content (body) in the HTTP response
'''
### Task 3(d) ###
body = bytes()
data = bytes()
while True:
data = sock.recv(content_length)
if len(data)<=0:
break
else:
body += data
return body
### Task 3(d) END ###
def receive_http_response_header(sock):
'''
Receive the HTTP response header from the TCP socket.
:param sock: the TCP socket connected to the remote server
:return: a bytes object that is the HTTP response header received
'''
### Task 3(b) ###
# Hint 1: use HTTP_HEADER_DELIMITER to determine the end of the HTTP header
# Hint 2: use sock.recv(ONE_BYTE_LENGTH) to receive the chunk byte-by-byte
header = bytes()
chunk = bytes()
try:
while HTTP_HEADER_DELIMITER not in chunk:
chunk = sock.recv(ONE_BYTE_LENGTH)
if not chunk:
break
else:
header += chunk
except socket.timeout:
pass
return header
### Task 3(b) END ###
def main():
# Change the host and path below to test other web sites!
host = 'example.com'
path = '/index.html'
print(f"# Retrieve data from http://{host}{path}")
# Get the IP address of the host
ip_address = gethostbyname(host)
print(f"> Remote server {host} resolved as {ip_address}")
# Establish the TCP connection to the host
sock = socket(AF_INET, SOCK_STREAM)
sock.connect((ip_address, HTTP_PORT))
print(f"> TCP Connection to {ip_address}:{HTTP_PORT} established")
# Uncomment this comment block after Task 3(a)
# Send an HTTP GET request
http_get_request = create_http_request(host, path)
print('\n# HTTP GET request ({} bytes)'.format(len(http_get_request)))
print(http_get_request)
sock.sendall(http_get_request)
# Comment block for Task 3(a) END
# Uncomment this comment block after Task 3(b)
# Receive the HTTP response header
header = receive_http_response_header(sock)
print(type(header))
print('\n# HTTP Response Header ({} bytes)'.format(len(header)))
print(header)
# Comment block for Task 3(b) END
# Uncomment this comment block after Task 3(c)
content_length = get_content_length(header)
print('\n# Content-Length')
print(f"{content_length} bytes")
# Comment block for Task 3(c) END
# Uncomment this comment block after Task 3(d)
body = receive_body(sock, content_length)
print('\n# Body ({} bytes)'.format(len(body)))
print(body)
# Comment block for Task 3(d) END
if __name__ == '__main__':
main()
#-*-编码:utf-8-*-
"""
task3.simple\u web\u浏览器
XX-YYY-ZZZ
"""
从套接字导入gethostbyname、套接字、超时、AF\u INET、SOCK\u流
从系统导入argv
HTTP\u头\u分隔符=b'\r\n\r\n'
内容长度字段=b'内容长度:'
HTTP_端口=80
一字节长度=1
def create_http_请求(主机、路径、方法='GET'):
'''
创建表示给定方法的HTTP/1.1请求的字节序列。
:param host:该字符串包含远程服务器的主机名
:param path:字符串包含要检索的文档的路径
:param method:字符串包含HTTP请求方法(例如,“GET”、“HEAD”等)
:return:bytes对象包含要发送到远程服务器的HTTP请求
e、 例如,一个HTTP/1.1 GET请求http://compass.unisg.ch/
主持人:compass.unisg.ch
路径:/
return:b'GET/HTTP/1.1\n主机:compass.unisg.ch\r\n\r\n'
'''
###任务3(a)###
#提示1:有关HTTP/1.1语法和语义规范,请参阅RFC7230-7231
# https://tools.ietf.org/html/rfc7230
# https://tools.ietf.org/html/rfc7231
#提示2:使用str.encode()创建字符串的编码版本作为字节对象
# https://docs.python.org/3/library/stdtypes.html#str.encode
r='{}{}HTTP/1.1\n主机:{}\r\n\r\n'。格式(方法、路径、主机)
响应=r.encode()
返回响应
###任务3(a)结束###
def get_内容_长度(标题):
'''
从内容长度HTTP头字段中获取整数值(如果
在给定的字节序列中找到。否则返回0。
:param header:bytes对象包含HTTP头
:return:内容长度的整数值,如果找不到,则为0
'''
###任务3(c)###
#提示:使用CONTENT\u LENGTH\u字段查找值
#请注意,内容长度字段可能并不总是在标题的末尾。
对于标头中的行。拆分(b'\r\n'):
如果行中的内容\长度\字段:
返回int(行[len(内容长度字段):])
返回0
###任务3(c)结束###
def接收体(袜子、内容物长度):
'''
在HTTP响应中接收正文内容
:param sock:连接到远程服务器的TCP套接字
:param content_length:要接收的内容的大小
:return:bytes对象包含HTTP响应中的剩余内容(正文)
'''
###任务3(d)###
body=bytes()
数据=字节()
尽管如此:
数据=sock.recv(内容长度)
if len(数据)
我有内容长度,标题的长度,还有标题
你没有。在receive\u-http\u-response\u-header
中,检查http\u-header\u-DELIMITER
始终只检查最新字节(chunk
而不是header
),这意味着您永远不会匹配头的结尾:
while HTTP_HEADER_DELIMITER not in chunk:
chunk = sock.recv(ONE_BYTE_LENGTH)
if not chunk:
break
else:
header += chunk
然后,假设您已经阅读了完整的标题,而实际上您已经阅读了完整的响应。这意味着另一个<代码> RecV<代码>当您尝试读取响应体时,只返回0,因为没有更多的数据,即主体已经包含在您所考虑的HTTP报头中。
除此之外,receive\u body
也是错误的,因为您在receive\u http\u response\u header
中也犯了类似的错误:目标是不反复读取recv
content\u length
字节,直到没有像您当前这样的字节可用,但目标是在length(body)时返回
匹配内容\u长度
并在正文未完全读取时继续读取剩余数据。如果是http请求,为什么不能使用python请求
包?它的使用非常简单,比如requests.get(“http://example.com)content
在本练习中,您不允许使用请求库…@Chris感谢我在“我有内容长度、标题长度和标题长度”之前从未使用过stackoverflow的提示-您知道吗?在receive\u-http\u-response\u-header
中,您检查http\u-header\u-DELIMITER
时总是只检查最新的字节(chunk
而不是header
),这意味着您永远不会匹配头的结尾。@SteffenUllrich是的,我以前发现了错误。无论如何,谢谢你。。