Python 3.x 使用BeautifuSoup分离HREF和锚文本_Python 3.x_Beautifulsoup

Python 3.x 使用BeautifuSoup分离HREF和锚文本

python-3.x

Python 3.x 使用BeautifuSoup分离HREF和锚文本,python-3.x,beautifulsoup,Python 3.x,Beautifulsoup,我使用Python3和BeautifulSoup4将HREF与文本本身分离。比如： <a href="yoursite.com" class=sample-class">LINK</a> 我想（1）提取并打印yoursite.com，然后获取链接如果有人能帮助我，那就太好了通过（比如）类名来定位a元素；使用字典式的属性访问；要获取链接文本，请执行以下操作： a = soup.find("a", class_="sample-class") # or soup.

我使用Python3和BeautifulSoup4将HREF与文本本身分离。比如：

<a href="yoursite.com" class=sample-class">LINK</a>

我想（1）提取并打印yoursite.com，然后获取链接

如果有人能帮助我，那就太好了

通过（比如）类名来定位

元素；使用字典式的属性访问；要获取链接文本，请执行以下操作：

a = soup.find("a", class_="sample-class")  # or soup.select_one("a.sample-class")
print(a["href"])
print(a.get_text())

标记可以具有任意数量的属性。标签有一个属性“class”，其值为“boldest”。您可以访问通过将标记视为字典来处理标记的属性：

字符串对应于标记中的一位文本。靓汤使用NavigableString类包含以下文本位：

你可以在

中找到它，他们尝试过这个：soup=beautifulsou（response.content，“html.parser”）link=soup.find_all（a，{classname}），但它不适用于print（link[“href”]）print（a.get_text（））说“列表索引必须是整数或切片，而不是str”ResultSet'object没有属性'get_text'。您没有在

soup.find_all（）

行中的字符串周围加引号。看看上面alecxe是怎么做到的。另外，

soup.find_all

返回它找到的所有内容的列表，而不仅仅是一个条目，因此您可能需要迭代：

[print（entry['href']）for entry in link]

> tag['class']
> # u'boldest'

tag.string
# u'Extremely bold'