Python 测试BeautifulSoup中的标记中是否存在属性_Python_Beautifulsoup

Python 测试BeautifulSoup中的标记中是否存在属性

python

Python 测试BeautifulSoup中的标记中是否存在属性,python,beautifulsoup,Python,Beautifulsoup,我希望获得文档中的所有标记，然后根据某些属性的存在（或不存在）处理每个标记例如，对于每个标记，如果的属性存在，则执行某些操作；否则，如果属性栏存在，请执行其他操作以下是我目前正在做的事情： outputDoc = BeautifulSoup(''.join(output)) scriptTags = outputDoc.findAll('script', attrs = {'for' : True}) 但是通过这种方式，我使用属性过滤所有标记。。。但是我丢失了其他的（那些没有for属性的）

我希望获得文档中的所有

标记，然后根据某些属性的存在（或不存在）处理每个标记

例如，对于每个

标记，如果

的属性存在，则执行某些操作；否则，如果属性栏
存在，请执行其他操作
以下是我目前正在做的事情：
outputDoc = BeautifulSoup(''.join(output))
scriptTags = outputDoc.findAll('script', attrs = {'for' : True})

但是通过这种方式，我使用属性过滤所有
标记。。。但是我丢失了其他的（那些没有for
属性的）。
如果我理解得很好，你只需要所有的脚本标记，然后检查其中的一些属性
scriptTags = outputDoc.findAll('script')
for script in scriptTags:
    if script.has_attr('some_attribute'):
        do_something()        

为便于将来参考，has_key已被弃用为beautifulsoup4。现在您需要使用has\u attr
scriptTags = outputDoc.find_all('script')
  for script in scriptTags:
    if script.has_attr('some_attribute'):
      do_something()  

如果只需要获取带有属性的标记，则可以使用lambda：
soup = bs4.BeautifulSoup(YOUR_CONTENT)


带属性的标记

或

带属性的特定标记


等等

认为它可能有用。
您不需要任何lambda来按属性过滤，您只需在查找或查找所有中使用some\u attribute=True

script_tags = soup.find_all('script', some_attribute=True)

# or

script_tags = soup.find_all('script', {"some-data-attribute": True})

以下是其他方法的更多示例：
soup = bs4.BeautifulSoup(html)

# Find all with a specific attribute

tags = soup.find_all(src=True)
tags = soup.select("[src]")

# Find all meta with either name or http-equiv attribute.

soup.select("meta[name],meta[http-equiv]")

# find any tags with any name or source attribute.

soup.select("[name], [src]")

# find first/any script with a src attribute.

tag = soup.find('script', src=True)
tag = soup.select_one("script[src]")

# find all tags with a name attribute beginning with foo
# or any src beginning with /path
soup.select("[name^=foo], [src^=/path]")

# find all tags with a name attribute that contains foo
# or any src containing with whatever
soup.select("[name*=foo], [src*=whatever]")

# find all tags with a name attribute that endwith foo
# or any src that ends with  whatever
soup.select("[name$=foo], [src$=whatever]")

您还可以将正则表达式与find或find_all一起使用：
import re
# starting with
soup.find_all("script", src=re.compile("^whatever"))
# contains
soup.find_all("script", src=re.compile("whatever"))
# ends with 
soup.find_all("script", src=re.compile("whatever$"))

通过使用pprint模块，您可以检查元素的内容
from pprint import pprint

pprint(vars(element))

在bs4元素上使用此选项将打印类似的内容：
{'attrs': {u'class': [u'pie-productname', u'size-3', u'name', u'global-name']},
 'can_be_empty_element': False,
 'contents': [u'\n\t\t\t\tNESNA\n\t'],
 'hidden': False,
 'name': u'span',
 'namespace': None,
 'next_element': u'\n\t\t\t\tNESNA\n\t',
 'next_sibling': u'\n',
 'parent': <h1 class="pie-compoundheader" itemprop="name">\n<span class="pie-description">Bedside table</span>\n<span class="pie-productname size-3 name global-name">\n\t\t\t\tNESNA\n\t</span>\n</h1>,
 'parser_class': <class 'bs4.BeautifulSoup'>,
 'prefix': None,
 'previous_element': u'\n',
 'previous_sibling': u'\n'}

您可以使用以下方法筛选元素：
for script in soup.find_all('script'):
    if script.attrs.get('for'):
        # ... Has 'for' attr
    elif "myClass" in script.attrs.get('class', []):
        # ... Has class "myClass"
    else: 
        # ... Do something else

您可以检查是否存在某些属性
scriptTags = outputDoc.findAll('script', some_attribute=True)
for script in scriptTags:
    do_something()
scriptTags=outputDoc.findAll（'script'，some_属性=True）
对于scriptTags中的脚本：
做某事
“但是如果……在里面不起作用”？这是什么意思？语法错误？你说“不行”是什么意思？请非常具体地说明出现了什么问题。是否要测试任何标记、所有标记中是否存在属性，或单独处理标记的每次出现？我无法执行以下操作：如果脚本中出现“some_attribute”，这就是我想要的，我希望避免一次又一次地调用findAll…为了检查可用的属性，必须使用python dict方法，例如：script.has_key（'some_attribute'），我如何检查标记是否有任何属性？虽然tag.has_key（'some_属性'）工作正常，但tag.keys（）会引发异常（'NoneType'对象不可调用）。请更新此帖子，has_key已弃用。Use已改为attr。遗憾的是，它对我不起作用。也许这样soup\u response.find（'err'）。字符串不是None也可以用于其他属性…优雅的解决方案！我同意这应该是公认的答案。我简化了主要示例，使其更加突出。
{'attrs': {u'class': [u'pie-productname', u'size-3', u'name', u'global-name']},
 'can_be_empty_element': False,
 'contents': [u'\n\t\t\t\tNESNA\n\t'],
 'hidden': False,
 'name': u'span',
 'namespace': None,
 'next_element': u'\n\t\t\t\tNESNA\n\t',
 'next_sibling': u'\n',
 'parent': <h1 class="pie-compoundheader" itemprop="name">\n<span class="pie-description">Bedside table</span>\n<span class="pie-productname size-3 name global-name">\n\t\t\t\tNESNA\n\t</span>\n</h1>,
 'parser_class': <class 'bs4.BeautifulSoup'>,
 'prefix': None,
 'previous_element': u'\n',
 'previous_sibling': u'\n'}

class_list = element.attrs.get('class', [])

for script in soup.find_all('script'):
    if script.attrs.get('for'):
        # ... Has 'for' attr
    elif "myClass" in script.attrs.get('class', []):
        # ... Has class "myClass"
    else: 
        # ... Do something else

scriptTags = outputDoc.findAll('script', some_attribute=True)
for script in scriptTags:
    do_something()