Python，url解析_Python_Regex_Django_Parsing_Url

Python，url解析

python regex django parsing url

Python，url解析,python,regex,django,parsing,url,Python,Regex,Django,Parsing,Url,我有url，例如：我需要解析它，我使用：但我想删除www，并对其进行更多的解析。我想要这样的东西： #path_without_www -- nicepage.com #list_of_path -- list_of_path[0] -> "nicecat", list_of_path[1] -> "something" 这个怎么样： import re from urlparse import urlparse url = urlparse('http://www.ni

我有url，例如：我需要解析它，我使用：

但我想删除www，并对其进行更多的解析。我想要这样的东西：

#path_without_www -- nicepage.com
#list_of_path -- list_of_path[0] -> "nicecat", list_of_path[1] -> "something"

这个怎么样：

import re
from urlparse import urlparse

url = urlparse('http://www.nicepage.com/nicecat/something')
url = url._replace(netloc=re.sub(r'^(www.)(.*)', r'\2', url.netloc))

正则表达式从netloc的开头去掉“www.”。从那里，您可以根据需要对其进行更多的解析。

以下内容将删除任何前导www并拆分其余元素以进行进一步处理：

print url.netloc.lstrip("www.").split(".")

给予：

['nicepage', 'com']

http://www.nicepage.com/nicecat/something.split“/”[2:]拆分、拼接和相关字符串处理似乎可以完成所有您想做的事情。你不想使用它们有什么原因吗？你的问题是什么？没有必要使用正则表达式。最后一行可以替换为url。_replacenetloc=url.netloc.lstrip'www.“是的，使用lstrip可以删除对regex的依赖，谢谢。

['nicepage', 'com']