Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/352.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python类型错误:编码和替换时?加上;“空间”;_Python_Web Scraping_Beautifulsoup_Python 3.5 - Fatal编程技术网

Python类型错误:编码和替换时?加上;“空间”;

Python类型错误:编码和替换时?加上;“空间”;,python,web-scraping,beautifulsoup,python-3.5,Python,Web Scraping,Beautifulsoup,Python 3.5,下面是一个函数,我试图使用BeautifulSoup python库从标记中获取文章,并对其进行编码、替换(“?”,”) 这是我得到的错误 _________________________ TypeError Traceback (most recent call last) <ipython-input-35-cafa01352f7e> in <module>() 1 doxyDonkeyPo

下面是一个函数,我试图使用BeautifulSoup python库从
  • 标记中获取文章,并对其进行编码、替换(“?”,”)

    这是我得到的错误

    _________________________
    TypeError                                 Traceback (most recent call last)
    <ipython-input-35-cafa01352f7e> in <module>()
          1 doxyDonkeyPosts = []
          2 for link in links:
    ----> 3     doxyDonkeyPosts+=getDoxyDonkeyText(link)
    
    <ipython-input-34-d5693b21e538> in getDoxyDonkeyText(testUrl)
          6     posts =[]
          7     for div in mydivs:
    ----> 8         posts+=map(lambda p:p.text.encode('ascii', errors='replace').replace("?"," "), div.findAll("li"))
          9     return posts
    
    <ipython-input-34-d5693b21e538> in <lambda>(p)
          6     posts =[]
          7     for div in mydivs:
    ----> 8         posts+=map(lambda p:p.text.encode('ascii', errors='replace').replace("?"," "), div.findAll("li"))
          9     return posts
    
    TypeError: a bytes-like object is required, not 'str'
    _____________
    
    _________________________
    TypeError回溯(最近一次调用上次)
    在()
    1个doxyDonkeyPosts=[]
    2对于链接中的链接:
    ---->3个doxyDonkeyPosts+=getDoxyDonkeyText(链接)
    在getDoxyDonkeyText(testUrl)中
    6个员额=[]
    7对于MyDiv中的div:
    ---->8 posts+=map(lambda p:p.text.encode('ascii',errors='replace').replace(“?”,”),div.findAll(“li”))
    9个返回岗位
    in(p)
    6个员额=[]
    7对于MyDiv中的div:
    ---->8 posts+=map(lambda p:p.text.encode('ascii',errors='replace').replace(“?”,”),div.findAll(“li”))
    9个返回岗位
    TypeError:需要类似字节的对象,而不是“str”
    _____________
    
    如能提供错误原因和解决方法,将不胜感激。 提前感谢。

    str.encode()将返回类似字节的对象,替换为str将导致错误。您需要提供字节替换。例如
    (b'?,b“”)

    这是简化版

    import urllib
    from bs4 import BeautifulSoup
    
    def getDoxyDonkeyText(testUrl):
        request = urllib.request.urlopen(testUrl)
        soup = BeautifulSoup(request, 'html.parser')
        mydivs = soup.findAll("div", {"class":'post-body'})
        posts =[]
        for div in mydivs:
            for li in div.find_all("li"):
                posts.append(
                    li.text.encode('ascii', errors='replace').replace(b"?", b" ")
                )
                # if you want string
                posts.append(
                    li.text.encode('ascii', errors='replace').decode().replace("?", " ")
                )
        return posts
    
    
    articleURL = "http://doxydonkey.blogspot.in"
    doxyDonkeyPosts=getDoxyDonkeyText(articleURL)
    print(doxyDonkeyPosts)
    

    AttributeError:ResultSet对象没有属性“text”。您可能将项目列表视为单个项目。当你打算调用find()时,你调用find_all()了吗?@josealex:现在试试。无懈可击!谢谢老兄,如果这解决了你的问题,你可以接受答案。
    import urllib
    from bs4 import BeautifulSoup
    
    def getDoxyDonkeyText(testUrl):
        request = urllib.request.urlopen(testUrl)
        soup = BeautifulSoup(request, 'html.parser')
        mydivs = soup.findAll("div", {"class":'post-body'})
        posts =[]
        for div in mydivs:
            for li in div.find_all("li"):
                posts.append(
                    li.text.encode('ascii', errors='replace').replace(b"?", b" ")
                )
                # if you want string
                posts.append(
                    li.text.encode('ascii', errors='replace').decode().replace("?", " ")
                )
        return posts
    
    
    articleURL = "http://doxydonkey.blogspot.in"
    doxyDonkeyPosts=getDoxyDonkeyText(articleURL)
    print(doxyDonkeyPosts)