Python 为什么来自BeautifulSoup的解析会抛出错误

Python 为什么来自BeautifulSoup的解析会抛出错误,python,beautifulsoup,html-parsing,Python,Beautifulsoup,Html Parsing,我有这个HTML源代码:-http://pastebin.com/itMYaimq。我正在运行下面的BeautifulSoup命令来解析HTML def check_img(self, feed): return 1 if feed.find_all('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x}) else 0 这里的提要是HTML源 在执行时,这将抛出 [2015-01-08 10:1

我有这个HTML源代码:-
http://pastebin.com/itMYaimq
。我正在运行下面的BeautifulSoup命令来解析HTML

def check_img(self, feed):
        return 1 if feed.find_all('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x}) else 0
这里的
提要
是HTML源

在执行时,这将抛出

[2015-01-08 10:19:16,415: WARNING/Worker-2] Traceback (most recent call last):
[2015-01-08 10:19:16,415: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/data_processors/rule_processor.py", line 58, in do_akamai_analysis
[2015-01-08 10:19:16,416: WARNING/Worker-2] resp, self.analysis.url, self.analysis.id)
[2015-01-08 10:19:16,416: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/rules.py", line 794, in akamai_rule_analysis
[2015-01-08 10:19:16,416: WARNING/Worker-2] result[RULES.FEO_CHECKS] = check_feo_optimizations(analysis_id, url)
[2015-01-08 10:19:16,417: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/rules.py", line 1320, in check_feo_optimizations
[2015-01-08 10:19:16,417: WARNING/Worker-2] return FEO_processor.FEOProcessor().process_feo_debug_output(analysis_id, url)
[2015-01-08 10:19:16,417: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/data_processors/FEO_processor.py", line 38, in process_feo_debug_output
[2015-01-08 10:19:16,417: WARNING/Worker-2] self.result[name] = (False, True)[getattr(self,func)(feed)]
[2015-01-08 10:19:16,418: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/data_processors/FEO_processor.py", line 64, in check_img
[2015-01-08 10:19:16,418: WARNING/Worker-2] return 1 if feed.find_all('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x}) else 0
[2015-01-08 10:19:16,418: WARNING/Worker-2] File "/Library/Python/2.7/site-packages/bs4/element.py", line 1180, in find_all
[2015-01-08 10:19:16,419: WARNING/Worker-2] return self._find_all(name, attrs, text, limit, generator, **kwargs)
[2015-01-08 10:19:16,419: WARNING/Worker-2] File "/Library/Python/2.7/site-packages/bs4/element.py", line 505, in _find_all
[2015-01-08 10:19:16,419: WARNING/Worker-2] found = strainer.search(i)
[2015-01-08 10:19:16,420: WARNING/Worker-2] File "/Library/Python/2.7/site-packages/bs4/element.py", line 1540, in search
[2015-01-08 10:19:16,420: WARNING/Worker-2] found = self.search_tag(markup)
[2015-01-08 10:19:16,420: WARNING/Worker-2] File "/Library/Python/2.7/site-packages/bs4/element.py", line 1512, in search_tag
[2015-01-08 10:19:16,421: WARNING/Worker-2] if not self._matches(attr_value, match_against):
[2015-01-08 10:19:16,421: WARNING/Worker-2] File "/Library/Python/2.7/site-packages/bs4/element.py", line 1578, in _matches
[2015-01-08 10:19:16,421: WARNING/Worker-2] return match_against(markup)
[2015-01-08 10:19:16,421: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/data_processors/FEO_processor.py", line 64, in <lambda>
[2015-01-08 10:19:16,422: WARNING/Worker-2] return 1 if feed.find_all('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x}) else 0
[2015-01-08 10:19:16,422: WARNING/Worker-2] TypeError: argument of type 'NoneType' is not itterable
[2015-01-08 10:19:16415:警告/Worker-2]回溯(最近一次呼叫最后一次):
[2015-01-08 10:19:16415:WARNING/Worker-2]文件“/Users/rokumar/sitelanalysisgit/Src/hct/hct/data\u processors/rule\u processor.py”,第58行,在do\u akamai\u analysis中
[2015-01-08 10:19:16416:WARNING/Worker-2]resp,self.analysis.url,self.analysis.id)
[2015-01-08 10:19:16416:WARNING/Worker-2]文件“/Users/rokumar/sitelanalysisgit/Src/hct/hct/rules.py”,第794行,akamai规则分析
[2015-01-08 10:19:16416:WARNING/Worker-2]结果[RULES.FEO_CHECKS]=检查优化(分析id、url)
[2015-01-08 10:19:16417:WARNING/Worker-2]文件“/Users/rokumar/sitelanalysisgit/Src/hct/hct/rules.py”,第1320行,检查优化
[2015-01-08 10:19:16417:WARNING/Worker-2]返回FEO_处理器。FEOProcessor()。处理调试输出(分析id,url)
[2015-01-08 10:19:16417:WARNING/Worker-2]文件“/Users/rokumar/sitelanalysisgit/Src/hct/hct/data\u processors/FEO\u processor.py”,第38行,进程中调试输出
[2015-01-08 10:19:16417:WARNING/Worker-2]self.result[name]=(False,True)[getattr(self,func)(feed)]
[2015-01-08 10:19:16418:WARNING/Worker-2]文件“/Users/rokumar/sitelanalysisgit/Src/hct/hct/data\u processors/FEO\u processor.py”,第64行,检查img
[2015-01-08 10:19:16418:警告/Worker-2]如果feed返回1。find_all('img',attrs={'data-blzsrc':True,'src':lambda x:'data'不在x})否则0
[2015-01-08 10:19:16418:WARNING/Worker-2]文件“/Library/Python/2.7/site packages/bs4/element.py”,第1180行,全部查找
[2015-01-08 10:19:16419:WARNING/Worker-2]返回self.\u find_all(名称、属性、文本、限制、生成器,**kwargs)
[2015-01-08 10:19:16419:WARNING/Worker-2]文件“/Library/Python/2.7/site packages/bs4/element.py”,第505行,全部查找
[2015-01-08 10:19:16419:警告/工人-2]发现=过滤器。搜索(i)
[2015-01-08 10:19:16420:WARNING/Worker-2]文件“/Library/Python/2.7/site packages/bs4/element.py”,第1540行,搜索中
[2015-01-08 10:19:16420:WARNING/Worker-2]found=self.search_标记(标记)
[2015-01-08 10:19:16420:WARNING/Worker-2]文件“/Library/Python/2.7/site packages/bs4/element.py”,第1512行,在搜索标签中
[2015-01-08 10:19:16421:警告/工人-2]如果不匹配自身(属性值,匹配):
[2015-01-08 10:19:16421:WARNING/Worker-2]文件“/Library/Python/2.7/site packages/bs4/element.py”,第1578行,在
[2015-01-08 10:19:16421:WARNING/Worker-2]返回匹配(标记)
[2015-01-08 10:19:16421:WARNING/Worker-2]文件“/Users/rokumar/sitelanalysisgit/Src/hct/hct/data_processors/FEO_processor.py”,第64行
[2015-01-08 10:19:16422:警告/Worker-2]如果feed返回1。find_all('img',attrs={'data-blzsrc':True,'src':lambda x:'data'不在x})否则0
[2015-01-08 10:19:16422:警告/Worker-2]类型错误:类型为“NoneType”的参数不可写入

我已经打印了
提要
,以查看其值。它打印了HTML源代码,因此它不是
None
。那么,为什么我会得到这个错误,因为'NoneType'类型的
参数不可测试

您的
src
lambda正在测试
None

>>> x = None
>>> 'data' not in x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: argument of type 'NoneType' is not iterable
简单测试一下:

lambda x: x and 'data' not in x
你的测试可以简化;无需查找所有匹配项,只需查找第一个:

blzsrc_image = feed.find('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x})
return 1 if blzsrc_image else 0
如果布尔值可用(而不是
1
0
),则可以使用:

blzsrc_image = feed.find('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x})
return blzsrc_image is not None

如果
feed
将是
None
,则会得到
AttributeError:“NoneType”对象没有属性“find\u all”
。你确定
x
中有什么吗
'data'不在None中
将给出
类型为'NoneType'的参数不可编辑
。x应该具有
img
标记的
src
属性。您可以在这里看到HTML您手动键入此回溯了吗?您在
iterable
中有拼写错误…第1885行中的
img
标记没有
src
属性。而这只是许多例子中的第一个。
blzsrc_image = feed.find('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x})
return blzsrc_image is not None