Python类型错误:编码和替换时?加上;“空间”;
下面是一个函数,我试图使用BeautifulSoup python库从Python类型错误:编码和替换时?加上;“空间”;,python,web-scraping,beautifulsoup,python-3.5,Python,Web Scraping,Beautifulsoup,Python 3.5,下面是一个函数,我试图使用BeautifulSoup python库从标记中获取文章,并对其进行编码、替换(“?”,”) 这是我得到的错误 _________________________ TypeError Traceback (most recent call last) <ipython-input-35-cafa01352f7e> in <module>() 1 doxyDonkeyPo
标记中获取文章,并对其进行编码、替换(“?”,”)
这是我得到的错误
_________________________
TypeError Traceback (most recent call last)
<ipython-input-35-cafa01352f7e> in <module>()
1 doxyDonkeyPosts = []
2 for link in links:
----> 3 doxyDonkeyPosts+=getDoxyDonkeyText(link)
<ipython-input-34-d5693b21e538> in getDoxyDonkeyText(testUrl)
6 posts =[]
7 for div in mydivs:
----> 8 posts+=map(lambda p:p.text.encode('ascii', errors='replace').replace("?"," "), div.findAll("li"))
9 return posts
<ipython-input-34-d5693b21e538> in <lambda>(p)
6 posts =[]
7 for div in mydivs:
----> 8 posts+=map(lambda p:p.text.encode('ascii', errors='replace').replace("?"," "), div.findAll("li"))
9 return posts
TypeError: a bytes-like object is required, not 'str'
_____________
_________________________
TypeError回溯(最近一次调用上次)
在()
1个doxyDonkeyPosts=[]
2对于链接中的链接:
---->3个doxyDonkeyPosts+=getDoxyDonkeyText(链接)
在getDoxyDonkeyText(testUrl)中
6个员额=[]
7对于MyDiv中的div:
---->8 posts+=map(lambda p:p.text.encode('ascii',errors='replace').replace(“?”,”),div.findAll(“li”))
9个返回岗位
in(p)
6个员额=[]
7对于MyDiv中的div:
---->8 posts+=map(lambda p:p.text.encode('ascii',errors='replace').replace(“?”,”),div.findAll(“li”))
9个返回岗位
TypeError:需要类似字节的对象,而不是“str”
_____________
如能提供错误原因和解决方法,将不胜感激。
提前感谢。str.encode()将返回类似字节的对象,替换为str将导致错误。您需要提供字节替换。例如(b'?,b“”)
这是简化版
import urllib
from bs4 import BeautifulSoup
def getDoxyDonkeyText(testUrl):
request = urllib.request.urlopen(testUrl)
soup = BeautifulSoup(request, 'html.parser')
mydivs = soup.findAll("div", {"class":'post-body'})
posts =[]
for div in mydivs:
for li in div.find_all("li"):
posts.append(
li.text.encode('ascii', errors='replace').replace(b"?", b" ")
)
# if you want string
posts.append(
li.text.encode('ascii', errors='replace').decode().replace("?", " ")
)
return posts
articleURL = "http://doxydonkey.blogspot.in"
doxyDonkeyPosts=getDoxyDonkeyText(articleURL)
print(doxyDonkeyPosts)
AttributeError:ResultSet对象没有属性“text”。您可能将项目列表视为单个项目。当你打算调用find()时,你调用find_all()了吗?@josealex:现在试试。无懈可击!谢谢老兄,如果这解决了你的问题,你可以接受答案。
import urllib
from bs4 import BeautifulSoup
def getDoxyDonkeyText(testUrl):
request = urllib.request.urlopen(testUrl)
soup = BeautifulSoup(request, 'html.parser')
mydivs = soup.findAll("div", {"class":'post-body'})
posts =[]
for div in mydivs:
for li in div.find_all("li"):
posts.append(
li.text.encode('ascii', errors='replace').replace(b"?", b" ")
)
# if you want string
posts.append(
li.text.encode('ascii', errors='replace').decode().replace("?", " ")
)
return posts
articleURL = "http://doxydonkey.blogspot.in"
doxyDonkeyPosts=getDoxyDonkeyText(articleURL)
print(doxyDonkeyPosts)