Python 3.x 为什么'soup.select_one'返回列表?
我在代码下面运行Python 3.x 为什么'soup.select_one'返回列表?,python-3.x,beautifulsoup,Python 3.x,Beautifulsoup,我在代码下面运行 import requests session = requests.Session() from bs4 import BeautifulSoup import re headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'} url = 'https://www.collinsdictionary.com/dictionar
import requests
session = requests.Session()
from bs4 import BeautifulSoup
import re
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
url = 'https://www.collinsdictionary.com/dictionary/english-french/graduate'
r = session.get(url, headers = headers)
soup = BeautifulSoup(r.content, 'html.parser')
content1 = soup.select_one('.cB.cB-def.dictionary.biling').contents
temp = re.findall('data-src-mp3="(.*?)"', content1)
然后是一个错误
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-feb1029c98d3> in <module>
10
11 content1 = soup.select_one('.cB.cB-def.dictionary.biling').contents
---> 12 temp = re.findall('data-src-mp3="(.*?)"', content1)
C:\Anaconda3\lib\re.py in findall(pattern, string, flags)
239
240 Empty matches are included in the result."""
--> 241 return _compile(pattern, flags).findall(string)
242
243 def finditer(pattern, string, flags=0):
TypeError: expected string or bytes-like object
请详细说明这个问题好吗?
。select\u one
不返回列表,它返回一个标记(正如它承诺的那样)
此列表包含content1
标记中包含的标记和元素
如果希望HTML作为字符串,可以使用str(content1)
:
输出
['https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0071410.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/fr_bachelier.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/63854.mp3']
然而,我对你选择使用正则表达式有点困惑。您已经在使用正确的HTML解析器。一般来说,应该避免使用正则表达式来解析HTML,因为HTML不是一种常规语言,所以使用正则表达式来解析HTML可能并不总是像预期的那样有效。只是出于好奇。您已经在使用html解析器,为什么要尝试使用正则表达式解析结果?regex和html不是朋友。@DeepSpace你的好奇心太大了。我想问一个问题,如何使用BeautifulSoup
(不带循环)来获得与re
相同的结果,但我在准备代码示例时遇到了这个错误。这就是为什么我必须先问这个问题的原因。请看一下这个!
content1 = soup.select_one('.cB.cB-def.dictionary.biling')
print(type(content1))
# <class 'bs4.element.Tag'>
print(type(content1.contents))
# <class 'list'>
print(re.findall('data-src-mp3="(.*?)"', str(content1)))
['https://www.collinsdictionary.com/sounds/hwd_sounds/EN-GB-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0037420.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/FR-W0071410.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/fr_bachelier.mp3', 'https://www.collinsdictionary.com/sounds/hwd_sounds/63854.mp3']