Python 3.x 为什么'soup.select_one'返回列表?

Python 3.x 为什么'soup.select_one'返回列表?,python-3.x,beautifulsoup,Python 3.x,Beautifulsoup,我在代码下面运行 import requests session = requests.Session() from bs4 import BeautifulSoup import re headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'} url = 'https://www.collinsdictionary.com/dictionar

我在代码下面运行

import requests
session = requests.Session()
from bs4 import BeautifulSoup
import re

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
url = 'https://www.collinsdictionary.com/dictionary/english-french/graduate'
r = session.get(url, headers = headers)           
soup = BeautifulSoup(r.content, 'html.parser')

content1 = soup.select_one('.cB.cB-def.dictionary.biling').contents
temp = re.findall('data-src-mp3="(.*?)"', content1)
然后是一个错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-feb1029c98d3> in <module>
     10 
     11 content1 = soup.select_one('.cB.cB-def.dictionary.biling').contents
---> 12 temp = re.findall('data-src-mp3="(.*?)"', content1)

C:\Anaconda3\lib\re.py in findall(pattern, string, flags)
    239 
    240     Empty matches are included in the result."""
--> 241     return _compile(pattern, flags).findall(string)
    242 
    243 def finditer(pattern, string, flags=0):

TypeError: expected string or bytes-like object

请详细说明这个问题好吗?

。select\u one
不返回列表,它返回一个
标记(正如它承诺的那样)

此列表包含
content1
标记中包含的标记和元素

如果希望HTML作为字符串,可以使用
str(content1)

输出

['https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0071410.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/fr_bachelier.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/63854.mp3']

然而,我对你选择使用正则表达式有点困惑。您已经在使用正确的HTML解析器。一般来说,应该避免使用正则表达式来解析HTML,因为HTML不是一种常规语言,所以使用正则表达式来解析HTML可能并不总是像预期的那样有效。

只是出于好奇。您已经在使用html解析器,为什么要尝试使用正则表达式解析结果?regex和html不是朋友。@DeepSpace你的好奇心太大了。我想问一个问题,如何使用
BeautifulSoup
(不带循环)来获得与
re
相同的结果,但我在准备代码示例时遇到了这个错误。这就是为什么我必须先问这个问题的原因。请看一下这个!
content1 = soup.select_one('.cB.cB-def.dictionary.biling')
print(type(content1))
# <class 'bs4.element.Tag'>
print(type(content1.contents))
# <class 'list'>
print(re.findall('data-src-mp3="(.*?)"', str(content1)))
['https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0071410.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/fr_bachelier.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/63854.mp3']