Python 将数字列表转换为字符串范围
我想知道是否有一种简单的(或已经创建的)方法来做与此相反的事情:。此链接可用于执行以下操作:Python 将数字列表转换为字符串范围,python,list,range,sequence,sequences,Python,List,Range,Sequence,Sequences,我想知道是否有一种简单的(或已经创建的)方法来做与此相反的事情:。此链接可用于执行以下操作: >> list(hyphen_range('1-9,12,15-20,23')) [1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 15, 16, 17, 18, 19, 20, 23]: 我希望做相反的事情(注意,10和21被包括在内,因此它将与range函数兼容,其中range(1,10)=[1,2,3,4,5,6,7,8,9]): 最后,我希望输出也包含一个步骤,其中输
>> list(hyphen_range('1-9,12,15-20,23'))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 15, 16, 17, 18, 19, 20, 23]:
我希望做相反的事情(注意,10和21被包括在内,因此它将与range函数兼容,其中range(1,10)=[1,2,3,4,5,6,7,8,9]):
最后,我希望输出也包含一个步骤,其中输出的最后一个数字表示该步骤:
>> list_to_ranges([1, 3, 5, 7, 8, 9, 10, 11])
'1-13:2,8,10'
从本质上讲,这最终有点像一个“逆”范围函数
>> tmp = list_to_ranges([1, 3, 5])
>> print tmp
'1-7:2'
>> range(1, 7, 2)
[1, 3, 5]
我的猜测是,没有真正简单的方法可以做到这一点,但我想我会在这里问下去,然后再去做一些蛮力,长的方法
编辑
以答案中的代码为例,我想出了一个简单的方法来完成第一部分。但我认为,识别要执行的步骤的模式会有点困难
from itertools import groupby
from operator import itemgetter
data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
print data, '\n'
str_list = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
ilist = map(itemgetter(1), g)
print ilist
if len(ilist) > 1:
str_list.append('%d-%d' % (ilist[0], ilist[-1]+1))
else:
str_list.append('%d' % ilist[0])
print '\n', ','.join(str_list)
编辑2
这是我尝试加入步长的尝试…它非常接近,但第一个数字会重复。我想稍微调整一下,它会接近我想要的——或者至少足够好
import numpy as np
from itertools import groupby
def list_to_ranges(data):
data = sorted(data)
diff_data = np.diff(data).tolist()
ranges = []
i = 0
for k, iterable in groupby(diff_data, None):
rng = list(iterable)
step = rng[0]
if len(rng) == 1:
ranges.append('%d' % data[i])
elif step == 1:
ranges.append('%d-%d' % (data[i], data[i+len(rng)]+step))
else:
ranges.append('%d-%d:%d' % (data[i], data[i+len(rng)]+step, step))
i += len(rng)
return ','.join(ranges)
data = [1, 3, 5, 6, 7, 11, 13, 15, 16, 17, 18, 19, 22, 25, 28]
print data
data_str = list_to_ranges(data)
print data_str
_list = []
for r in data_str.replace('-',':').split(','):
r = [int(a) for a in r.split(':')]
if len(r) == 1:
_list.extend(r)
elif len(r) == 2:
_list.extend(range(r[0], r[1]))
else:
_list.extend(range(r[0], r[1], r[2]))
print _list
print list(set(_list))
很可能就是你要找的
编辑:我看你已经找到了帖子。我道歉
为了帮助完成第二部分,我自己做了一些修补。这就是我想到的:
from numpy import diff
data = [ 1, 3, 5, 7, 8, 9, 10, 11, 13, 15, 17 ]
onediff, twodiff = diff(data), diff(diff(data))
increments, breakingindices = [], []
for i in range(len(twodiff)):
if twodiff[i] != 0:
breakingindices.append(i+2) # Correct index because of the two diffs
increments.append(onediff[i]) # Record the increment for this section
# Increments and breakingindices should be the same size
str_list = []
start = data[0]
for i in range(len(breakingindices)):
str_list.append("%d-%d:%d" % (start, data[breakingindices[i]-1], increments[i]))
start = data[breakingindices[i]]
str_list.append("%d-%d:%d" % (start, data[len(data)-1], onediff[len(onediff)-1]))
print str_list
对于给定的输入列表,它给出:['1-7:2','8-11:1','13-17:2']
。代码可能需要一些清理,但是如果分组可以按顺序进行,那么这会解决您的问题
{注意:对于[1,2,3,5,6,7],它给出的是['1-3:1','5-5:2','6-7:1'],而不是['1-3:1','5-7:1']}一种方法可以是一块一块地“吃”输入序列,并存储部分范围结果,直到您得到所有结果:
def formatter(start, end, step):
return '{}-{}:{}'.format(start, end, step)
# return '{}-{}:{}'.format(start, end + step, step)
def helper(lst):
if len(lst) == 1:
return str(lst[0]), []
if len(lst) == 2:
return ','.join(map(str,lst)), []
step = lst[1] - lst[0]
for i,x,y in zip(itertools.count(1), lst[1:], lst[2:]):
if y-x != step:
if i > 1:
return formatter(lst[0], lst[i], step), lst[i+1:]
else:
return str(lst[0]), lst[1:]
return formatter(lst[0], lst[-1], step), []
def re_range(lst):
result = []
while lst:
partial,lst = helper(lst)
result.append(partial)
return ','.join(result)
我用一堆单元测试来测试它,它通过了所有测试,它也可以处理负数,但它们看起来有点难看(这真的是任何人的错)
例如:
>>> re_range([1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28])
'1,4-6:1,10,15-18:1,22,25-28:1'
>>> re_range([1, 3, 5, 7, 8, 9, 10, 11, 13, 15, 17])
'1-7:2,8-11:1,13-17:2'
注意:我为Python 3编写了代码
演出 我没有在上面的解决方案中投入任何性能努力。特别是,每次使用切片重新构建列表时,如果输入列表具有特定形状,则可能需要一些时间。因此,第一个简单的改进是在可能的情况下使用 无论如何,这是同一算法的另一个实现,它使用
scan
索引扫描输入列表,而不是切片:
def re_range(lst):
n = len(lst)
result = []
scan = 0
while n - scan > 2:
step = lst[scan + 1] - lst[scan]
if lst[scan + 2] - lst[scan + 1] != step:
result.append(str(lst[scan]))
scan += 1
continue
for j in range(scan+2, n-1):
if lst[j+1] - lst[j] != step:
result.append(formatter(lst[scan], lst[j], step))
scan = j+1
break
else:
result.append(formatter(lst[scan], lst[-1], step))
return ','.join(result)
if n - scan == 1:
result.append(str(lst[scan]))
elif n - scan == 2:
result.append(','.join(map(str, lst[scan:])))
return ','.join(result)
当它比以前的顶级解决方案快了约65%时,我就停止了工作:)
无论如何,我认为仍然有改进的空间(特别是在环的中间)。
< P>这是3种方法的比较。通过下面的值更改数据量和密度…无论我使用什么值,第一个解决方案对我来说似乎是最快的。对于非常大的数据集,第三种解决方案变得非常缓慢 已编辑 编辑以包含以下注释并添加新解决方案。最后一个解决方案现在似乎是最快的import numpy as np
import itertools
import random
import timeit
# --- My Solution --------------------------------------------------------------
def list_to_ranges1(data):
data = sorted(data)
diff_data = np.diff(data)
ranges = []
i = 0
skip_next = False
for k, iterable in itertools.groupby(diff_data, None):
rng = list(iterable)
step = rng[0]
if skip_next:
skip_next = False
rng.pop()
if len(rng) == 0:
continue
elif len(rng) == 1:
ranges.append('%d' % data[i])
elif step == 1:
ranges.append('%d-%d' % (data[i], data[i+len(rng)]+step))
i += 1
skip_next = True
else:
ranges.append('%d-%d:%d' % (data[i], data[i+len(rng)]+step, step))
i += 1
skip_next = True
i += len(rng)
if len(rng) == 0 or len(rng) == 1:
ranges.append('%d' % data[i])
return ','.join(ranges)
# --- Kaidence Solution --------------------------------------------------------
# With a minor edit for use in range function
def list_to_ranges2(data):
onediff = np.diff(data)
twodiff = np.diff(onediff)
increments, breakingindices = [], []
for i in range(len(twodiff)):
if twodiff[i] != 0:
breakingindices.append(i+2) # Correct index because of the two diffs
increments.append(onediff[i]) # Record the increment for this section
# Increments and breakingindices should be the same size
str_list = []
start = data[0]
for i in range(len(breakingindices)):
str_list.append("%d-%d:%d" % (start,
data[breakingindices[i]-1] + increments[i],
increments[i]))
start = data[breakingindices[i]]
str_list.append("%d-%d:%d" % (start,
data[len(data)-1] + onediff[len(onediff)-1],
onediff[len(onediff)-1]))
return ','.join(str_list)
# --- Rik Poggi Solution -------------------------------------------------------
# With a minor edit for use in range function
def helper(lst):
if len(lst) == 1:
return str(lst[0]), []
if len(lst) == 2:
return ','.join(map(str,lst)), []
step = lst[1] - lst[0]
#for i,x,y in itertools.izip(itertools.count(1), lst[1:], lst[2:]):
for i,x,y in itertools.izip(itertools.count(1),
itertools.islice(lst, 1, None, 1),
itertools.islice(lst, 2, None, 1)):
if y-x != step:
if i > 1:
return '{}-{}:{}'.format(lst[0], lst[i]+step, step), lst[i+1:]
else:
return str(lst[0]), lst[1:]
return '{}-{}:{}'.format(lst[0], lst[-1]+step, step), []
def list_to_ranges3(lst):
result = []
while lst:
partial,lst = helper(lst)
result.append(partial)
return ','.join(result)
# --- Rik Poggi Solution 2 -----------------------------------------------------
def formatter(start, end, step):
#return '{}-{}:{}'.format(start, end, step)
return '{}-{}:{}'.format(start, end + step, step)
def list_to_ranges4(lst):
n = len(lst)
result = []
scan = 0
while n - scan > 2:
step = lst[scan + 1] - lst[scan]
if lst[scan + 2] - lst[scan + 1] != step:
result.append(str(lst[scan]))
scan += 1
continue
for j in xrange(scan+2, n-1):
if lst[j+1] - lst[j] != step:
result.append(formatter(lst[scan], lst[j], step))
scan = j+1
break
else:
result.append(formatter(lst[scan], lst[-1], step))
return ','.join(result)
if n - scan == 1:
result.append(str(lst[scan]))
elif n - scan == 2:
result.append(','.join(itertools.imap(str, lst[scan:])))
return ','.join(result)
# --- Test Function ------------------------------------------------------------
def test_data(data, f_to_test):
data_str = f_to_test(data)
_list = []
for r in data_str.replace('-',':').split(','):
r = [int(a) for a in r.split(':')]
if len(r) == 1:
_list.extend(r)
elif len(r) == 2:
_list.extend(range(r[0], r[1]))
else:
_list.extend(range(r[0], r[1], r[2]))
return _list
# --- Timing Tests -------------------------------------------------------------
# Generate some sample data...
data_list = []
for i in range(5):
# Note: using the "4000" and "5000" values below, the relative density of
# the data can be changed. This has a huge effect on the results
# (particularly on the results for list_to_ranges3 which uses recursion).
data_list.append(sorted(list(set([random.randint(1,4000) for a in \
range(random.randint(5,5000))]))))
testfuncs = list_to_ranges1, list_to_ranges2, list_to_ranges3, list_to_ranges4
for f in testfuncs:
print '\n', f.__name__
for i, data in enumerate(data_list):
t = timeit.Timer('f(data)', 'from __main__ import data, f')
#print f(data)
print i, data==test_data(data, f), round(t.timeit(200), 3)
此函数应该可以执行您需要的操作,而不需要任何导入
def listToRanges(self, intList):
ret = []
for val in sorted(intList):
if not ret or ret[-1][-1]+1 != val:
ret.append([val])
else:
ret[-1].append(val)
return ",".join([str(x[0]) if len(x)==1 else str(x[0])+"-"+str(x[-1]) for x in ret])
这类似于处理一种情况的步长的版本,但也处理单例(一个序列中不超过2个元素的元素或重复元素)和非单位步长(包括负步长)。它也不会为类似
[1、2、3、3、4、5]
的列表删除重复项
至于运行时间:在你眨眼之前就完成了
def ranges(L):
"""return a list of singletons or ranges of integers, (first, last, step)
as they occur sequentially in the list of integers, L.
Examples
========
>>> list(ranges([1, 2, 4, 6, 7, 8, 10, 12, 13]))
[1, (2, 6, 2), 7, (8, 12, 2), 13]
>>> list(ranges([1,2,3,4,3,2,1,3,5,7,11,1,2,3]))
[(1, 4, 1), (3, 1, -1), (3, 7, 2), 11, (1, 3, 1)]
"""
if not L:
return []
r = []
for i in L:
if len(r) < 2:
r.append(i)
if len(r) == 2:
d = r[1] - r[0]
else:
if i - r[1] == d:
r[1] = i
else:
if r[1] - r[0] == d:
yield(r.pop(0))
r.append(i)
d = r[1] - r[0]
else:
yield(tuple(r+[d]))
r[:] = [i]
if len(r) == 1:
yield(r.pop())
elif r[1] - r[0] == d:
for i in r:
yield i
else:
yield(tuple(r+[d]))
你所说的暴力手段,根本不需要那么长时间……我同意。为了识别模式,您必须解析列表,特别是如果您还想添加非统一步骤识别,这里有一些模糊:
1-13:2,8,10
与1-7:2,7-11
相同。在我们真正研究算法之前,你必须对你想要的给出一个更精确的定义。@Winston Ewert:同意。这是我考虑过的…两者都是有效的输出。我真的不在乎发生了什么结果,只要它们是等价的。好吧,但是1,3,5,7,8,9,10,11
也会和1,3,5,7-11
一样等价。当然,你有一些超越等价性的需求。是的,这很好……顺便说一句,这与我刚才发现的基本相同(见我的编辑)。你知道有什么简单的方法让它使用步骤吗?人力资源管理,也许它值得检查一下功能。虽然差异列表中的数字相同,但我们可以将它们组合在一起。(diff([1,3,5]会返回[2,2])。不要只发布一个链接作为答案。要么在你的帖子中提供链接中的信息,要么发表评论。@Kaidence:这也是我的想法。我在我的帖子的“编辑2”中有了一个不错的开始,包括使用numpy的diff。似乎我通过修改“编辑2”的方法获得了最好的结果帖子。我做了一个新的答案来比较三种解决方案(我的、你的和Rik的)。这对我的测试也很有用,谢谢。我唯一改变的是在helper函数的中间项中添加了一步(范围不包括第二项,即:范围(1,6)=[1,2,3,4,5]->它不包括6)。因此,对于第一个示例,它应该是4-7:1,而不是4-6:1。因此,请使用…格式(lst[0],lst[i]+步长,步长)@ScottB:我没有注意到你想要这种行为。我会说这是一个品味的问题,例如,我不喜欢用连字符字符串来显示最后一个数字,应该由hypend_range
来修正它,使其表现得像range(just)。我会将你的修正添加为自定义格式化程序
:)@ScottB:我认为这很慢,因为存在对输入列表的递归和重复剪切。是否存在真正的性能问题,或者您只是在进行基准测试?更好的方法是做同样的事情,但是只解析字符串而不使用递归和剪切
def ranges(L):
"""return a list of singletons or ranges of integers, (first, last, step)
as they occur sequentially in the list of integers, L.
Examples
========
>>> list(ranges([1, 2, 4, 6, 7, 8, 10, 12, 13]))
[1, (2, 6, 2), 7, (8, 12, 2), 13]
>>> list(ranges([1,2,3,4,3,2,1,3,5,7,11,1,2,3]))
[(1, 4, 1), (3, 1, -1), (3, 7, 2), 11, (1, 3, 1)]
"""
if not L:
return []
r = []
for i in L:
if len(r) < 2:
r.append(i)
if len(r) == 2:
d = r[1] - r[0]
else:
if i - r[1] == d:
r[1] = i
else:
if r[1] - r[0] == d:
yield(r.pop(0))
r.append(i)
d = r[1] - r[0]
else:
yield(tuple(r+[d]))
r[:] = [i]
if len(r) == 1:
yield(r.pop())
elif r[1] - r[0] == d:
for i in r:
yield i
else:
yield(tuple(r+[d]))
def sranges(i):
"""return pretty string for output of ranges.
Examples
========
>>> sranges([1,2,4,6,7,8,10,12,13,15,16,17])
'1, range(2, 8, 2), 7, range(8, 14, 2), 13, range(15, 18)'
"""
out = []
for i in ranges(i):
if type(i) is int:
out.append(str(i))
elif i[-1] == 1:
if i[0] == 0:
out.append('range(%s)'%(i[1] + 1))
else:
out.append('range(%s, %s)'%(i[0], i[1] + 1))
else:
out.append('range(%s, %s, %s)'%(i[0], i[1] + i[2], i[2]))
return ', '.join(out)