Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/csharp-4.0/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 生成URL端点的组合_Python_Python 2.7_Url_Urlparse - Fatal编程技术网

Python 生成URL端点的组合

Python 生成URL端点的组合,python,python-2.7,url,urlparse,Python,Python 2.7,Url,Urlparse,我有一个URL,如下所示: http://example.com/foo/bar/baz/file.php 我有一个端点名为/potato 我想从这些URL生成以下URL: http://example.com/foo/potato http://example.com/foo/bar/potato http://example.com/foo/bar/baz/potato 到目前为止,我的尝试涉及到在斜线处拆分,它忽略了端点本身以/等开头的情况 要做到这一点,最干净的方法是什么?您可以使用

我有一个URL,如下所示:

http://example.com/foo/bar/baz/file.php
我有一个端点名为
/potato

我想从这些URL生成以下URL:

http://example.com/foo/potato
http://example.com/foo/bar/potato
http://example.com/foo/bar/baz/potato
到目前为止,我的尝试涉及到在斜线处拆分,它忽略了端点本身以
/
等开头的情况


要做到这一点,最干净的方法是什么?

您可以使用列表理解:

import re
s = 'http://example.com/foo/bar/baz/file.php'
*path, _ = re.split('(?<=\w)/(?=\w)', s)
results = [f'{"/".join(path[:2+i])}/potato' for i in range(len(path)-1)]

编辑:Python2.7解决方案:

import re
s = 'http://example.com/foo/bar/baz/file.php'
path = re.split('(?<=\w)/(?=\w)', s)[:-1]
result = ['{}/potato'.format("/".join(path[:1+i])) for i in range(len(path))]
另一种可靠而准确地解析url的方法是使用
urllib.parse

import urllib.parse
d = urllib.parse.urlsplit(s)
_, *path, _ = d.path.split('/')
result = [f'{d.scheme}://{d.netloc}/{"/".join(path[:i])}/potato' for i in range(1, len(path)+1)]
import urlparse
d = urlparse.urlparse(s)
path = d.path.split('/')[1:-1]
result = ['{}://{}/{}/potato'.format(d.scheme, d.netloc, "/".join(path[:i]))  for i in range(len(path))]
输出:

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
在Python2.7中,使用
urlparse

import urllib.parse
d = urllib.parse.urlsplit(s)
_, *path, _ = d.path.split('/')
result = [f'{d.scheme}://{d.netloc}/{"/".join(path[:i])}/potato' for i in range(1, len(path)+1)]
import urlparse
d = urlparse.urlparse(s)
path = d.path.split('/')[1:-1]
result = ['{}://{}/{}/potato'.format(d.scheme, d.netloc, "/".join(path[:i]))  for i in range(len(path))]
输出:

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
编辑2:计时:

可以找到计时的来源

从图中可以看出,在大多数情况下,
urlparse
re

编辑3:通用解决方案:

import re
def generate_url_combos(s, endpoint):
   path = re.split('(?<=\w)/(?=\w)', re.sub('(?<=\w)/\w+\.\w+$|(?<=\w)/\w+\.\w+/+$', '', s).strip('/'))
   return ['{}/{}'.format("/".join(path[:1+i]), re.sub('^/|/+$', '', endpoint)) for i in range(len(path))]

tests = [('http://example.com/foo/bar/baz/file.php/', '/potato'), ('http://example.com/foo/bar/baz/file.php', '/potato'), ('http://example.com/foo/bar/baz/file.php', 'potato'), ('http://example.com/foo/bar/baz/file.php', 'potato/'), ('http://example.com/foo/bar/baz/file.php//', 'potato'), ('http://example.com/', 'potato'), ('http://example.com', 'potato'), ('http://example.com/', '/potato'), ('http://example.com', '/potato')]
for a, b in tests:
   print generate_url_combos(a, b)

编辑4:

import urlparse, re
def generate_url_combos(s, endpoint):
   d = urlparse.urlparse(s)
   path = list(filter(None, d.path.split('/')))
   if not path:
     return '{}://{}/{}'.format(d.scheme, d.netloc, re.sub('^/+|/+$', '', endpoint))
   path = path[:-1] if re.findall('\.\w+$', path[-1]) else path
   return ['{}://{}/{}'.format(d.scheme, d.netloc, re.sub('^/+|/+$', '', endpoint) if not i else "/".join(path[:i])+'/'+re.sub('^/+|/+$', '', endpoint))  for i in range(len(path)+1)]

tests = [('http://example.com/foo/bar/baz/file.php/', '/potato'), ('http://example.com/foo/bar/baz/file.php', '/potato'), ('http://example.com/foo/bar/baz/file.php', 'potato'), ('http://example.com/foo/bar/baz/file.php', 'potato/'), ('http://example.com/foo/bar/baz/file.php//', 'potato'), ('http://example.com/', 'potato'), ('http://example.com', 'potato'), ('http://example.com/', '/potato'), ('http://example.com', '/potato')]
for a, b in tests:
   print generate_url_combos(a, b)
输出:

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']

您可以使用列表:

import re
s = 'http://example.com/foo/bar/baz/file.php'
*path, _ = re.split('(?<=\w)/(?=\w)', s)
results = [f'{"/".join(path[:2+i])}/potato' for i in range(len(path)-1)]

编辑:Python2.7解决方案:

import re
s = 'http://example.com/foo/bar/baz/file.php'
path = re.split('(?<=\w)/(?=\w)', s)[:-1]
result = ['{}/potato'.format("/".join(path[:1+i])) for i in range(len(path))]
另一种可靠而准确地解析url的方法是使用
urllib.parse

import urllib.parse
d = urllib.parse.urlsplit(s)
_, *path, _ = d.path.split('/')
result = [f'{d.scheme}://{d.netloc}/{"/".join(path[:i])}/potato' for i in range(1, len(path)+1)]
import urlparse
d = urlparse.urlparse(s)
path = d.path.split('/')[1:-1]
result = ['{}://{}/{}/potato'.format(d.scheme, d.netloc, "/".join(path[:i]))  for i in range(len(path))]
输出:

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
在Python2.7中,使用
urlparse

import urllib.parse
d = urllib.parse.urlsplit(s)
_, *path, _ = d.path.split('/')
result = [f'{d.scheme}://{d.netloc}/{"/".join(path[:i])}/potato' for i in range(1, len(path)+1)]
import urlparse
d = urlparse.urlparse(s)
path = d.path.split('/')[1:-1]
result = ['{}://{}/{}/potato'.format(d.scheme, d.netloc, "/".join(path[:i]))  for i in range(len(path))]
输出:

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
编辑2:计时:

可以找到计时的来源

从图中可以看出,在大多数情况下,
urlparse
re

编辑3:通用解决方案:

import re
def generate_url_combos(s, endpoint):
   path = re.split('(?<=\w)/(?=\w)', re.sub('(?<=\w)/\w+\.\w+$|(?<=\w)/\w+\.\w+/+$', '', s).strip('/'))
   return ['{}/{}'.format("/".join(path[:1+i]), re.sub('^/|/+$', '', endpoint)) for i in range(len(path))]

tests = [('http://example.com/foo/bar/baz/file.php/', '/potato'), ('http://example.com/foo/bar/baz/file.php', '/potato'), ('http://example.com/foo/bar/baz/file.php', 'potato'), ('http://example.com/foo/bar/baz/file.php', 'potato/'), ('http://example.com/foo/bar/baz/file.php//', 'potato'), ('http://example.com/', 'potato'), ('http://example.com', 'potato'), ('http://example.com/', '/potato'), ('http://example.com', '/potato')]
for a, b in tests:
   print generate_url_combos(a, b)

编辑4:

import urlparse, re
def generate_url_combos(s, endpoint):
   d = urlparse.urlparse(s)
   path = list(filter(None, d.path.split('/')))
   if not path:
     return '{}://{}/{}'.format(d.scheme, d.netloc, re.sub('^/+|/+$', '', endpoint))
   path = path[:-1] if re.findall('\.\w+$', path[-1]) else path
   return ['{}://{}/{}'.format(d.scheme, d.netloc, re.sub('^/+|/+$', '', endpoint) if not i else "/".join(path[:i])+'/'+re.sub('^/+|/+$', '', endpoint))  for i in range(len(path)+1)]

tests = [('http://example.com/foo/bar/baz/file.php/', '/potato'), ('http://example.com/foo/bar/baz/file.php', '/potato'), ('http://example.com/foo/bar/baz/file.php', 'potato'), ('http://example.com/foo/bar/baz/file.php', 'potato/'), ('http://example.com/foo/bar/baz/file.php//', 'potato'), ('http://example.com/', 'potato'), ('http://example.com', 'potato'), ('http://example.com/', '/potato'), ('http://example.com', '/potato')]
for a, b in tests:
   print generate_url_combos(a, b)
输出:

['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato', 'http://example.com/foo/potato', 'http://example.com/foo/bar/potato', 'http://example.com/foo/bar/baz/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']
['http://example.com/potato']

谢谢我喜欢这个解决方案的紧凑性,但是这里真的需要正则表达式吗?我可以看到它使用了lookarounds——如果我要使用它数百万次,它会比不使用regex的解决方案慢吗?我对你的想法很感兴趣。遗憾的是,我正在使用Python 2.7-
*path
,我相信
f
格式字符串不会起作用。您将如何转换它?@JosephJohn请查看我最近的编辑,因为我在Python2.7中添加了解决方案。您可以使用
urllib
(Python 3)或
urlparse
(Python 2)来解析URL,而不是regex。“我将立即添加速度测试。”约瑟夫约翰我还添加了计时。我想你会对结果感兴趣的。真的很有趣。谢谢你的努力。我还想要
http://example.com/potato
在我的输出中。我将如何修改这个?谢谢!我喜欢这个解决方案的紧凑性,但是这里真的需要正则表达式吗?我可以看到它使用了lookarounds——如果我要使用它数百万次,它会比不使用regex的解决方案慢吗?我对你的想法很感兴趣。遗憾的是,我正在使用Python 2.7-
*path
,我相信
f
格式字符串不会起作用。您将如何转换它?@JosephJohn请查看我最近的编辑,因为我在Python2.7中添加了解决方案。您可以使用
urllib
(Python 3)或
urlparse
(Python 2)来解析URL,而不是regex。“我将立即添加速度测试。”约瑟夫约翰我还添加了计时。我想你会对结果感兴趣的。真的很有趣。谢谢你的努力。我还想要
http://example.com/potato
在我的输出中。我将如何修改它?如果您只是学习基础知识,您可能应该忽略Python 2,并将时间花在当前推荐和支持的语言版本上,即Python 3。如果您只是学习基础知识,您可能应该忽略Python 2,并将时间花在当前推荐和支持的语言版本上,即Python 3。