对web python进行爬网时出错

对web python进行爬网时出错,python,Python,当我尝试运行下面的代码时,返回了此错误。如果有人能帮我指出我哪里做错了,我将不胜感激。多谢各位 Traceback (most recent call last): File "web_crawler.py", line 26, in <module> links = get_all_links(page) File "web_crawler.py", line 14, in get_all_links url, endpos = get_next_targe

当我尝试运行下面的代码时,返回了此错误。如果有人能帮我指出我哪里做错了,我将不胜感激。多谢各位

Traceback (most recent call last):
  File "web_crawler.py", line 26, in <module>
    links = get_all_links(page)
  File "web_crawler.py", line 14, in get_all_links
    url, endpos = get_next_target(page)
  File "web_crawler.py", line 2, in get_next_target
    start_link = page.find("<a href=")
TypeError: a bytes-like object is required, not 'str'

def get_next_target(page):
    start_link = page.find("<a href=")
    if start_link == -1:
        return None, 0
    start_quote = page.find('"',start_link)
    end_quote = page.find('"',start_quote+1)
    url = page[start_quote+1:end_quote]
    print(url)
    return url, end_quote

def get_all_links(page):
    links = []
    while True:
        url, endpos = get_next_target(page)
        if url:
            links.append(url)
            page = page[endpos:]
        else:
            break
    return links

import requests
url='https://en.wikipedia.org/wiki/Moon'
r = requests.get(url)
page = r.content
links = get_all_links(page)
回溯(最近一次呼叫最后一次):
文件“web_crawler.py”,第26行,在
链接=获取所有链接(第页)
文件“web\u crawler.py”,第14行,在get\u all\u链接中
url,endpos=获取下一个目标(第页)
文件“web\u crawler.py”,第2行,在get\u next\u目标中

start\u link=page.find(“
response.content
是请求的原始内容。它们没有被解码,只是原始字节

您想要使用的是
response.text
属性,该属性将解码内容作为字符串包含


(您可能还希望使用类似的html解析库,而不是当前的
页面。查找
方法)

有关
r.content
r.text
之间的区别(当
r
是从
请求返回的响应对象时。获取
),请参阅