获取类型错误:python中需要字符串或缓冲区
我有一个简单的代码:获取类型错误:python中需要字符串或缓冲区,python,Python,我有一个简单的代码: #usr/bin/python from bs4 import BeautifulSoup import requests import tldextract def scrape(url): main_domain = tldextract.extract(url) r = requests.get(url) data = r.text soup = BeautifulSoup(data) list = [] for h
#usr/bin/python
from bs4 import BeautifulSoup
import requests
import tldextract
def scrape(url):
main_domain = tldextract.extract(url)
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
list = []
for href in soup.find_all('a'):
link_domain = tldextract.extract(href.get('href'))
print link_domain
print
将错误获取为:
Traceback (most recent call last):
File "cloud.py", line 20, in <module>
scrape("--- url here -- ")
File "cloud.py", line 14, in scrape
link_domain = tldextract.extract(href.get('href'))
File "/usr/lib/python2.6/site-packages/tldextract/tldextract.py", line 196, in extract
return TLD_EXTRACTOR(url)
File "/usr/lib/python2.6/site-packages/tldextract/tldextract.py", line 127, in __call__
netloc = SCHEME_RE.sub("", url) \
TypeError: expected string or buffer
回溯(最近一次呼叫最后一次):
文件“cloud.py”,第20行,在
刮取(“--url此处--”)
文件“cloud.py”,第14行,在scrape中
link_domain=tldextract.extract(href.get('href'))
文件“/usr/lib/python2.6/site packages/tldextract/tldextract.py”,第196行,摘录
返回TLD_提取器(url)
文件“/usr/lib/python2.6/site packages/tldextract/tldextract.py”,第127行,在调用中__
netloc=SCHEME\u RE.sub(“,url)\
TypeError:应为字符串或缓冲区
如何修复它。您的一些
a
标记没有href
属性,因此.get('href')
返回None
使用:
要在这种情况下返回空字符串,或首先测试属性,请执行以下操作:
href = href.get('href')
if not href:
continue
link_domain = tldextract.extract(href)
将完整的回溯?文件“/usr/lib/python2.6/site packages/tldextract/tldextract.py”第196行粘贴到提取返回TLD_提取器(url)文件“/usr/lib/python2.6/site packages/tldextract/tldextract.py”第127行的call netloc=SCHEME_RE.sub(“,url)\TypeError:需要字符串或缓冲区
href = href.get('href')
if not href:
continue
link_domain = tldextract.extract(href)