Python 试图从以'；结尾的字符串获取所有链接'；_Python_Regex_Beautifulsoup

Python 试图从以'；结尾的字符串获取所有链接'；

python regex

Python 试图从以'；结尾的字符串获取所有链接'；,python,regex,beautifulsoup,Python,Regex,Beautifulsoup,我需要从Genius页面获取艺术家专辑的所有链接。所有链接都可以在第五个标签预加载内容属性值data-preload_data中找到。该值以str的形式存储在变量字符串中我试图提取所有链接，以'https://genius.com/albums/“然后以”“结尾，但它不起作用。当我没有使用$sign作为结尾时，我得到了正确数量的链接，但没有必要的结尾部分 import urllib.request, urllib.parse, urllib.error from bs4 import Beau

我需要从Genius页面获取艺术家专辑的所有链接。所有链接都可以在第五个标签预加载内容属性值data-preload_data中找到。该值以str的形式存储在变量字符串中
我试图提取所有链接，以'https://genius.com/albums/“然后以”“结尾，但它不起作用。当我没有使用$sign作为结尾时，我得到了正确数量的链接，但没有必要的结尾部分

import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl from urllib.request import Request, urlopen import re name = input('Rapper - ') url = 'https://genius.com/artists/'+name+'' hdr = {'User-Agent': 'Mozilla/5.0'} req = Request(url,headers=hdr) html = urlopen(req) soup = BeautifulSoup(html, 'html.parser') hrefs = soup.find_all("preload-content") string = hrefs[5]['data-preload_data'] result = re.findall('(https://genius.com/albums/'+name+'.,$)', string) print(result)
你可以用

re.findall（r'(https://genius\.com/albums/'+re.escape（名称）+'/[^“\'\s]*？）“，”，字符串）
看
详细信息

(https://genius\.com/albums/'+re.escape（名称）+'/[^“\'\s]*？）
-第1组：

https://genius\.com/albums/'+re.escape（name）+'/
-文字子字符串

[^”\'\s]*？
-除
“
、
”
、空格、
之外的任何零个或多个字符，尽可能少（由于
*？
惰性量词）

”，
-文本字符串

请注意，在正则表达式中使用的
名称
必须转义所有特殊字符，才能使正则表达式语法正确，因此使用
re.escape（name）
仍然不匹配。您可以提供起始url，在该url中，可以在属性data-preload_data的第五个标记预加载内容值中找到所有链接？我想你想要的是
re.findall（r'https://genius\.com/albums/\S*，'，string）
或
re.findall（r''https://genius.com/albums/[^“'\s]*，''，text）
，请参阅。在检查HTML之后，我认为
re.findall（r'(https://genius\.com/albums/'+name+'/[^“\'\s]*？）”，，字符串）
应该有效。检查。@WiktorStribiżew非常感谢您的帮助。СПзззФззз