我有这个超文本标记语言的源代码:- http://pastebin.com/itMYaimq。我正在运行以下BeautifulSoup命令来解析超文本标记语言
def check_img(self, feed):
return 1 if feed.find_all('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x}) else 0这里的feed是HTML源。
在执行时,这个抛出。
[2015-01-08 10:19:16,415: WARNING/Worker-2] Traceback (most recent call last):
[2015-01-08 10:19:16,415: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/data_processors/rule_processor.py", line 58, in do_akamai_analysis
[2015-01-08 10:19:16,416: WARNING/Worker-2] resp, self.analysis.url, self.analysis.id)
[2015-01-08 10:19:16,416: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/rules.py", line 794, in akamai_rule_analysis
[2015-01-08 10:19:16,416: WARNING/Worker-2] result[RULES.FEO_CHECKS] = check_feo_optimizations(analysis_id, url)
[2015-01-08 10:19:16,417: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/rules.py", line 1320, in check_feo_optimizations
[2015-01-08 10:19:16,417: WARNING/Worker-2] return FEO_processor.FEOProcessor().process_feo_debug_output(analysis_id, url)
[2015-01-08 10:19:16,417: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/data_processors/FEO_processor.py", line 38, in process_feo_debug_output
[2015-01-08 10:19:16,417: WARNING/Worker-2] self.result[name] = (False, True)[getattr(self,func)(feed)]
[2015-01-08 10:19:16,418: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/data_processors/FEO_processor.py", line 64, in check_img
[2015-01-08 10:19:16,418: WARNING/Worker-2] return 1 if feed.find_all('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x}) else 0
[2015-01-08 10:19:16,418: WARNING/Worker-2] File "/Library/Python/2.7/site-packages/bs4/element.py", line 1180, in find_all
[2015-01-08 10:19:16,419: WARNING/Worker-2] return self._find_all(name, attrs, text, limit, generator, **kwargs)
[2015-01-08 10:19:16,419: WARNING/Worker-2] File "/Library/Python/2.7/site-packages/bs4/element.py", line 505, in _find_all
[2015-01-08 10:19:16,419: WARNING/Worker-2] found = strainer.search(i)
[2015-01-08 10:19:16,420: WARNING/Worker-2] File "/Library/Python/2.7/site-packages/bs4/element.py", line 1540, in search
[2015-01-08 10:19:16,420: WARNING/Worker-2] found = self.search_tag(markup)
[2015-01-08 10:19:16,420: WARNING/Worker-2] File "/Library/Python/2.7/site-packages/bs4/element.py", line 1512, in search_tag
[2015-01-08 10:19:16,421: WARNING/Worker-2] if not self._matches(attr_value, match_against):
[2015-01-08 10:19:16,421: WARNING/Worker-2] File "/Library/Python/2.7/site-packages/bs4/element.py", line 1578, in _matches
[2015-01-08 10:19:16,421: WARNING/Worker-2] return match_against(markup)
[2015-01-08 10:19:16,421: WARNING/Worker-2] File "/Users/rokumar/SiteAnalysisGit/Src/hct/hct/data_processors/FEO_processor.py", line 64, in <lambda>
[2015-01-08 10:19:16,422: WARNING/Worker-2] return 1 if feed.find_all('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x}) else 0
[2015-01-08 10:19:16,422: WARNING/Worker-2] TypeError: argument of type 'NoneType' is not itterable我打印了feed以查看它的值。它打印了HTML源文件,所以它不是None。那么为什么我会以argument of type 'NoneType' is not iterable的身份收到这个错误呢
发布于 2015-01-08 21:51:06
您的src lambda正在针对None进行测试
>>> x = None
>>> 'data' not in x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: argument of type 'NoneType' is not iterable当您尝试对没有src属性的<img>标记进行验证时,就会发生这种情况;您的输入源有8个这样的标记:
>>> import requests
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(requests.get('http://pastebin.com/raw.php?i=itMYaimq').content)
>>> len(soup.find_all('img', src=False))
8简单地测试一下:
lambda x: x and 'data' not in x您的测试可以简化;无需查找所有匹配项,只需查找第一个匹配项:
blzsrc_image = feed.find('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x})
return 1 if blzsrc_image else 0如果布尔值可以(而不是1或0),您可以使用:
blzsrc_image = feed.find('img', attrs={'data-blzsrc': True, 'src': lambda x: 'data' not in x})
return blzsrc_image is not Nonehttps://stackoverflow.com/questions/27841473
复制相似问题