Python 生成URL端点的组合_Python_Python 2.7_Url_Urlparse

Python 生成URL端点的组合

python python-2.7 url

Python 生成URL端点的组合,python,python-2.7,url,urlparse,Python,Python 2.7,Url,Urlparse,我有一个URL，如下所示： http://example.com/foo/bar/baz/file.php 我有一个端点名为/potato 我想从这些URL生成以下URL： http://example.com/foo/potato http://example.com/foo/bar/potato http://example.com/foo/bar/baz/potato 到目前为止，我的尝试涉及到在斜线处拆分，它忽略了端点本身以/等开头的情况要做到这一点，最干净的方法是什么？您可以使用

我有一个URL，如下所示：

http://example.com/foo/bar/baz/file.php

我有一个端点名为

/potato

我想从这些URL生成以下URL：

http://example.com/foo/potato
http://example.com/foo/bar/potato
http://example.com/foo/bar/baz/potato

到目前为止，我的尝试涉及到在斜线处拆分，它忽略了端点本身以

等开头的情况

要做到这一点，最干净的方法是什么？

您可以使用列表理解：

import re
s = 'http://example.com/foo/bar/baz/file.php'
*path, _ = re.split('(?<=\w)/(?=\w)', s)
results = [f'{"/".join(path[:2+i])}/potato' for i in range(len(path)-1)]

编辑：Python2.7解决方案：

import re
s = 'http://example.com/foo/bar/baz/file.php'
path = re.split('(?<=\w)/(?=\w)', s)[:-1]
result = ['{}/potato'.format("/".join(path[:1+i])) for i in range(len(path))]

另一种可靠而准确地解析url的方法是使用

urllib.parse

：

import urllib.parse
d = urllib.parse.urlsplit(s)
_, *path, _ = d.path.split('/')
result = [f'{d.scheme}://{d.netloc}/{"/".join(path[:i])}/potato' for i in range(1, len(path)+1)]

import urlparse
d = urlparse.urlparse(s)
path = d.path.split('/')[1:-1]
result = ['{}://{}/{}/potato'.format(d.scheme, d.netloc, "/".join(path[:i]))  for i in range(len(path))]

输出：

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']

在Python2.7中，使用

urlparse

：

import urllib.parse
d = urllib.parse.urlsplit(s)
_, *path, _ = d.path.split('/')
result = [f'{d.scheme}://{d.netloc}/{"/".join(path[:i])}/potato' for i in range(1, len(path)+1)]

import urlparse
d = urlparse.urlparse(s)
path = d.path.split('/')[1:-1]
result = ['{}://{}/{}/potato'.format(d.scheme, d.netloc, "/".join(path[:i]))  for i in range(len(path))]

输出：

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']

编辑2：计时：

可以找到计时的来源

从图中可以看出，在大多数情况下，

urlparse

比

re

慢

编辑3：通用解决方案：

import re
def generate_url_combos(s, endpoint):
   path = re.split('(?<=\w)/(?=\w)', re.sub('(?<=\w)/\w+\.\w+$|(?<=\w)/\w+\.\w+/+$', '', s).strip('/'))
   return ['{}/{}'.format("/".join(path[:1+i]), re.sub('^/|/+$', '', endpoint)) for i in range(len(path))]

tests = [('http://example.com/foo/bar/baz/file.php/', '/potato'), ('http://example.com/foo/bar/baz/file.php', '/potato'), ('http://example.com/foo/bar/baz/file.php', 'potato'), ('http://example.com/foo/bar/baz/file.php', 'potato/'), ('http://example.com/foo/bar/baz/file.php//', 'potato'), ('http://example.com/', 'potato'), ('http://example.com', 'potato'), ('http://example.com/', '/potato'), ('http://example.com', '/potato')]
for a, b in tests:
   print generate_url_combos(a, b)

编辑4：

import urlparse, re
def generate_url_combos(s, endpoint):
   d = urlparse.urlparse(s)
   path = list(filter(None, d.path.split('/')))
   if not path:
     return '{}://{}/{}'.format(d.scheme, d.netloc, re.sub('^/+|/+$', '', endpoint))
   path = path[:-1] if re.findall('\.\w+$', path[-1]) else path
   return ['{}://{}/{}'.format(d.scheme, d.netloc, re.sub('^/+|/+$', '', endpoint) if not i else "/".join(path[:i])+'/'+re.sub('^/+|/+$', '', endpoint))  for i in range(len(path)+1)]

tests = [('http://example.com/foo/bar/baz/file.php/', '/potato'), ('http://example.com/foo/bar/baz/file.php', '/potato'), ('http://example.com/foo/bar/baz/file.php', 'potato'), ('http://example.com/foo/bar/baz/file.php', 'potato/'), ('http://example.com/foo/bar/baz/file.php//', 'potato'), ('http://example.com/', 'potato'), ('http://example.com', 'potato'), ('http://example.com/', '/potato'), ('http://example.com', '/potato')]
for a, b in tests:
   print generate_url_combos(a, b)

输出：

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']

您可以使用列表：

import re
s = 'http://example.com/foo/bar/baz/file.php'
*path, _ = re.split('(?<=\w)/(?=\w)', s)
results = [f'{"/".join(path[:2+i])}/potato' for i in range(len(path)-1)]

编辑：Python2.7解决方案：

import re
s = 'http://example.com/foo/bar/baz/file.php'
path = re.split('(?<=\w)/(?=\w)', s)[:-1]
result = ['{}/potato'.format("/".join(path[:1+i])) for i in range(len(path))]

另一种可靠而准确地解析url的方法是使用

urllib.parse

：

import urllib.parse
d = urllib.parse.urlsplit(s)
_, *path, _ = d.path.split('/')
result = [f'{d.scheme}://{d.netloc}/{"/".join(path[:i])}/potato' for i in range(1, len(path)+1)]

import urlparse
d = urlparse.urlparse(s)
path = d.path.split('/')[1:-1]
result = ['{}://{}/{}/potato'.format(d.scheme, d.netloc, "/".join(path[:i]))  for i in range(len(path))]

输出：

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']

在Python2.7中，使用

urlparse

：

import urllib.parse
d = urllib.parse.urlsplit(s)
_, *path, _ = d.path.split('/')
result = [f'{d.scheme}://{d.netloc}/{"/".join(path[:i])}/potato' for i in range(1, len(path)+1)]

import urlparse
d = urlparse.urlparse(s)
path = d.path.split('/')[1:-1]
result = ['{}://{}/{}/potato'.format(d.scheme, d.netloc, "/".join(path[:i]))  for i in range(len(path))]

输出：

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']

编辑2：计时：

可以找到计时的来源

从图中可以看出，在大多数情况下，

urlparse

比

re

慢

编辑3：通用解决方案：

import re
def generate_url_combos(s, endpoint):
   path = re.split('(?<=\w)/(?=\w)', re.sub('(?<=\w)/\w+\.\w+$|(?<=\w)/\w+\.\w+/+$', '', s).strip('/'))
   return ['{}/{}'.format("/".join(path[:1+i]), re.sub('^/|/+$', '', endpoint)) for i in range(len(path))]

tests = [('http://example.com/foo/bar/baz/file.php/', '/potato'), ('http://example.com/foo/bar/baz/file.php', '/potato'), ('http://example.com/foo/bar/baz/file.php', 'potato'), ('http://example.com/foo/bar/baz/file.php', 'potato/'), ('http://example.com/foo/bar/baz/file.php//', 'potato'), ('http://example.com/', 'potato'), ('http://example.com', 'potato'), ('http://example.com/', '/potato'), ('http://example.com', '/potato')]
for a, b in tests:
   print generate_url_combos(a, b)

编辑4：

import urlparse, re
def generate_url_combos(s, endpoint):
   d = urlparse.urlparse(s)
   path = list(filter(None, d.path.split('/')))
   if not path:
     return '{}://{}/{}'.format(d.scheme, d.netloc, re.sub('^/+|/+$', '', endpoint))
   path = path[:-1] if re.findall('\.\w+$', path[-1]) else path
   return ['{}://{}/{}'.format(d.scheme, d.netloc, re.sub('^/+|/+$', '', endpoint) if not i else "/".join(path[:i])+'/'+re.sub('^/+|/+$', '', endpoint))  for i in range(len(path)+1)]

tests = [('http://example.com/foo/bar/baz/file.php/', '/potato'), ('http://example.com/foo/bar/baz/file.php', '/potato'), ('http://example.com/foo/bar/baz/file.php', 'potato'), ('http://example.com/foo/bar/baz/file.php', 'potato/'), ('http://example.com/foo/bar/baz/file.php//', 'potato'), ('http://example.com/', 'potato'), ('http://example.com', 'potato'), ('http://example.com/', '/potato'), ('http://example.com', '/potato')]
for a, b in tests:
   print generate_url_combos(a, b)

输出：

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']

['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']

谢谢我喜欢这个解决方案的紧凑性，但是这里真的需要正则表达式吗？我可以看到它使用了lookarounds——如果我要使用它数百万次，它会比不使用regex的解决方案慢吗？我对你的想法很感兴趣。遗憾的是，我正在使用Python 2.7-

*path

，我相信

格式字符串不会起作用。您将如何转换它？@JosephJohn请查看我最近的编辑，因为我在Python2.7中添加了解决方案。您可以使用

urllib

（Python 3）或

urlparse

（Python 2）来解析URL，而不是regex。“我将立即添加速度测试。”约瑟夫约翰我还添加了计时。我想你会对结果感兴趣的。真的很有趣。谢谢你的努力。我还想要

http://example.com/potato

在我的输出中。我将如何修改这个？谢谢！我喜欢这个解决方案的紧凑性，但是这里真的需要正则表达式吗？我可以看到它使用了lookarounds——如果我要使用它数百万次，它会比不使用regex的解决方案慢吗？我对你的想法很感兴趣。遗憾的是，我正在使用Python 2.7-

*path

，我相信

格式字符串不会起作用。您将如何转换它？@JosephJohn请查看我最近的编辑，因为我在Python2.7中添加了解决方案。您可以使用

urllib

（Python 3）或

urlparse

http://example.com/potato

在我的输出中。我将如何修改它？如果您只是学习基础知识，您可能应该忽略Python 2，并将时间花在当前推荐和支持的语言版本上，即Python 3。如果您只是学习基础知识，您可能应该忽略Python 2，并将时间花在当前推荐和支持的语言版本上，即Python 3。