仅对一个HTML类使用get_text（）——Python、BeautifulSoup_Python_Beautifulsoup

仅对一个HTML类使用get_text（）——Python、BeautifulSoup

python

仅对一个HTML类使用get_text（）——Python、BeautifulSoup,python,beautifulsoup,Python,Beautifulsoup,我试图访问一个类HTML中的唯一文本。我试图应用到BeautifulSoup，但我总是收到相同的错误消息或此标记中的所有项目我的代码.py from urllib.request import urlopen from bs4 import BeautifulSoup import requests import re url = "https://www.auchandirect.pl/auchan-warszawa/pl/pepsi-cola-max-niskokaloryczny-na

我试图访问一个类HTML中的唯一文本。我试图应用到BeautifulSoup，但我总是收到相同的错误消息或此标记中的所有项目

我的代码.py

from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
import re

url = "https://www.auchandirect.pl/auchan-warszawa/pl/pepsi-cola-max-niskokaloryczny-napoj-gazowany-o-smaku-cola/p-98502176"
r = requests.get(url, headers={'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'}, timeout=15)
html = urlopen(url)
soup = BeautifulSoup(html, 'lxml')
type(soup)

products_links = soup.findAll("a", {'class' : 'current-page'})

print(products_links)

在结果中，我只需要这款“Max Niskokakaloryczny napój gazowany o smaku可乐”

我的结果是：

<a class="current-page" href="/auchan-warszawa/pl/pepsi-cola-max-niskokaloryczny-napoj-gazowany-o-smaku-cola/p-98502176"><span>Max niskokaloryczny napój gazowany o smaku cola</span></a>

如何从“当前页面”中正确提取文本？为什么函数不返回标签中的文本？使用“findAll”（“a”，class=“current page”）访问类与使用“findAll”（“a”，{class'：“current page”）访问类有什么区别？它给出了相同的结果

任何帮助都将不胜感激。

findAll返回在您定义的标记中找到的项目列表。想象一下，若有多个相似的标记，它将返回多个匹配标记的列表

无论是使用

findAll（“a”，class=“current page”）

还是传递带有多个参数的dict

{'class'：'current page'}

，都应该没有任何区别。我可能错了，但我相信，因为其中一些方法是从早期版本继承而来的

通过选择元素并获得如下所示的文本属性，可以从返回的对象中提取文本：

products_links = soup.findAll("a", {'class' : 'current-page'}, text = True)
print(products_links[0].text)

非常感谢你的回答。也许是因为我的英语知识很差，但是在文档中我在哪里可以找到这些信息呢？您如何知道使用“text=True”和“产品链接”.text”元素？能理智地解释吗？没问题。希望有帮助。您应该能够在此处获得信息，**kwargs）。我认为不需要

text=True

，重要的是您知道

findAll

返回与您的参数匹配的对象列表。

products_links = soup.findAll("a", {'class' : 'current-page'}, text = True)
print(products_links[0].text)