Python 美化组搜索属性值_Python_Python 3.x_Beautifulsoup_Bs4

Python 美化组搜索属性值

python python-3.x

Python 美化组搜索属性值,python,python-3.x,beautifulsoup,bs4,Python,Python 3.x,Beautifulsoup,Bs4,我试图在HTML文档中搜索特定的属性值。 e、 g 或但我要找的是： soup.find_all(*=re.compile('prio.*')) ) 我不知道这是否是最好的方法，但这是有效的： >>> soup.find_all(lambda element: any(re.search('prio.*', attr) for attr in element.attrs.values())) [<h2 itemprop="prio1"> TEXT PRIO 1

我试图在HTML文档中搜索特定的属性值。 e、 g

或

但我要找的是：

soup.find_all(*=re.compile('prio.*')) )

我不知道这是否是最好的方法，但这是有效的：

>>> soup.find_all(lambda element: any(re.search('prio.*', attr) for attr in element.attrs.values()))
[<h2 itemprop="prio1">  TEXT PRIO 1 </h2>, <span id="prio2"> TEXT PRIO 2 </span>]

首先，您的正则表达式是错误的，如果您只想查找以prio开头的字符串，您可以使用

作为前缀，因为您的正则表达式将匹配字符串中任何位置的prio，如果您要搜索每个属性，您应该只使用str.startswith：

但是，如果您想要一个更高效的解决方案，您可能希望了解允许您使用通配符的解决方案：

from lxml import html

xml = html.fromstring(h)

tags = xml.xpath("//*[starts-with(@*,'prio')]")
print(tags)

或者只是标识一个itemprop：

tags = xml.xpath("//*[starts-with(@id,'prio') or starts-with(@itemprop, 'prio')]")

有没有可能lamba的这种语法只适用于beautifulsoup3而不适用于bs4？因为我得到了错误：AttributeError:“list”对象没有属性“startswith”@info:我在Python3.5中使用BeautifulSoup4，这段代码在这里运行良好。您遇到了什么错误？错误：

Traceback（最后一次调用）：文件“BeautifulSoup_test.py”，第40行，在soup中。查找所有（lambda元素：any（attr.startswith（'prio'）表示attr in element.attrs.values（））文件“/usr/local/lib/python3.5/site packages/bs4/element.py”，第1259行，在。。。文件“BeautifulSoup_test.py”，第40行，在soup.find_all（lambda元素：any（attr.startswith（'prio'），用于attr in element.attrs.values（）））AttributeError:“list”对象没有属性“startswith”

btw：我还使用bs4和Python 3.5.1这似乎适用于简单的示例文档，但不适用于真实文档。。。。我会试着找出原因，然后写在这里。你是在寻找全部还是仅仅是itemprop和id？

>>> soup.find_all(lambda element: any(re.search('prio.*', attr) for attr in element.attrs.values()))
[<h2 itemprop="prio1">  TEXT PRIO 1 </h2>, <span id="prio2"> TEXT PRIO 2 </span>]

soup.find_all(lambda element: any(attr.startswith('prio') for attr in element.attrs.values())))

h = """<html>
  <h2 itemprop="prio1">  TEXT PRIO 1 </h2>
  <span id="prio2"> TEXT PRIO 2 </span>
</html>"""

soup = BeautifulSoup(h, "lxml")


tags = soup.find_all(lambda t: any(a.startswith("prio") for a in t.attrs.values()))

tags = soup.find_all(lambda t: t.get("id","").startswith("prio") or t.get("itemprop","").startswith("prio"))

from lxml import html

xml = html.fromstring(h)

tags = xml.xpath("//*[starts-with(@*,'prio')]")
print(tags)

tags = xml.xpath("//*[starts-with(@id,'prio') or starts-with(@itemprop, 'prio')]")