Python 3.x 刮削蟒蛇_Python 3.x - Fatal编程技术网

Python 3.x 刮削蟒蛇

python-3.x

Python 3.x 刮削蟒蛇,python-3.x,Python 3.x,我在href标签中有两个不同内容的标签，我只想要一个我想知道BeautifulSoup是否可以只选择以特定单词开头的href。如果我知道，谢谢你 <a href="https://facebook.com/" </a> 另一个呢 <a href="https://Instagram.com/" </a> 有很多选项可以选择，下面是3个最常见的CSS选择器，regex和lambda： data = ''' <a href="https://face

我在href标签中有两个不同内容的标签，我只想要一个我想知道BeautifulSoup是否可以只选择以特定单词开头的href。如果我知道，谢谢你

<a href="https://facebook.com/" </a>

另一个呢

<a href="https://Instagram.com/" </a>

有很多选项可以选择，下面是3个最常见的CSS选择器，regex和lambda：

data = '''
<a href="https://facebook.com/">TAG 1</a>
<a href="https://instagram.com/">TAG 2</a>
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

# 1st option - CSS selector
print(soup.select_one('a[href^="https://instagram"]'))

# 2nd option - using regexp
import re
print(soup.find('a', {'href': re.compile(r'^https://instagram')}))

# 3rd option - using lambda
print(soup.find(lambda tag: 'href' in tag.attrs and tag['href'].startswith('https://instagram')))

印刷品：

<a href="https://instagram.com/">TAG 2</a>
<a href="https://instagram.com/">TAG 2</a>
<a href="https://instagram.com/">TAG 2</a>

<a href="https://instagram.com/A">TAG 2</a>
<a href="https://instagram.com/B">TAG 4</a>

编辑：要选择以某个字符串开头的多个链接，请执行以下操作：

data = '''
<a href="https://facebook.com/">TAG 1</a>
<a href="https://instagram.com/A">TAG 2</a>
<a href="https://facebook.com/">TAG 3</a>
<a href="https://instagram.com/B">TAG 4</a>
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

for link in soup.select('a[href^="https://instagram"]'):
    print(link)

印刷品：

<a href="https://instagram.com/">TAG 2</a>
<a href="https://instagram.com/">TAG 2</a>
<a href="https://instagram.com/">TAG 2</a>

<a href="https://instagram.com/A">TAG 2</a>
<a href="https://instagram.com/B">TAG 4</a>

CSS选择器参考使用此选项。

以下是一个简短的示例：

从bs4导入BeautifulSoup 进口稀土 html= page=BeautifulSouphtml 迭代“a”元素并搜索是否以开始https://Insta 对于page.findAlla中的i：如果i.gethref.start启动withhttps://Insta: instagram=i 单行和正则表达式版本： facebook=[如果重新匹配，则page.findAlla中的i代表i^https://face，i.gethref][0] 打印Facebook 打印Instagram 输出：

soup=BeautifulSoupr.content，'html.parser'表示soup中的链接。选择如何将1选项放入其中，以及如何仅过滤href=/video标记内的单词，或仅过滤Url内的更改parameter@JacksuelSoaresBraga我不太明白。您可以选择“soup”。选择“a[href*=/video]”，选择href属性中包含/video的链接。@JacksuelSoaresBraga要关闭您需要接受的问题，请单击分数旁边的复选标记