在Python中将重复项转换为索引的优雅方法?
我要做的是转换在Python中将重复项转换为索引的优雅方法?,python,Python,我要做的是转换 ['x','y','z','x','z'] 进入: ['x1',y',z1',x2',z2'] 意义 如果事件('y'此处)不重复,则保持原样 如果出现多个事件('x'或'z'),则会根据它们的级别添加后缀 在我的例子中,数组大小很小(少于20个元素,没有性能问题) 可以用一个优雅、简短的Python表达式来实现吗 编辑: 即使我不知道它是否优雅,我最终还是暂时编写了这个(没有导入,只有一本字典): 以便: [x for x in index_duplicates(l)]
['x','y','z','x','z']
进入:
['x1',y',z1',x2',z2']
意义
- 如果事件(
此处)不重复,则保持原样'y'
- 如果出现多个事件(
或'x'
),则会根据它们的级别添加后缀'z'
[x for x in index_duplicates(l)]
返回结果。使用
collections.Counter
()和itertools.count
():
印刷品:
['x1', 'y', 'z1', 'x2', 'z2']
编辑(仅使用集合的另一个版本。计数器
):
由于您希望
y
在该项只有一次出现时保持不变,因此必须首先通过数组的每个元素来检查重复项
from collections import Counter, defaultdict
arr = ['x', 'y', 'z', 'x', 'z']
dd = defaultdict(int)
c = Counter(arr)
single_occurrence_set = {k for k, v in c.items() if v == 1}
result = []
for item in arr:
dd[item] += 1
result.append(item if item in single_occurrence_set else f'{item}{dd[item]}')
>>> result
['x1', 'y', 'z1', 'x2', 'z2']
如果y
可能变成y1
,则代码将简化,因为不再需要计数器
和单次出现
dd = defaultdict(int)
result = []
for item in arr:
dd[item] += 1
result.append(f'{item}{dd[item]}')
>>> result
['x1', 'y1', 'z1', 'x2', 'z2']
您可以这样做:
from collections import Counter
l = ['x', 'y', 'z', 'x', 'z']
count = Counter(l)
current = {k:1 for k in l}
new_l = []
for val in l:
data = val
# Do we have a duplicate?
if count[val] > 1:
data += str(current[val])
current[val] += 1
new_l.append(data)
print(new_l)
# ['x1', 'y', 'z1', 'x2', 'z2']
原始版本-O(n^2)
如果没有导入,我们可以执行以下简单的实现:
arr = ['x', 'y', 'z', 'x', 'z']
out = []
for i in range(len(arr)):
c = arr[i]
if arr.count(c) > 1:
out.append(c + str(arr[:i].count(c) + 1))
else:
out.append(c)
高效版本-
O(n)
如果我们想获得更好的时间复杂度,我们可以遍历列表一次,以获得列表中每个唯一字符的总计数。然后,我们可以第二次遍历该列表,在执行过程中缓存字符数,并使用它来获得答案
arr = ['x', 'y', 'z', 'x', 'z']
out = []
totals = dict()
freqs = dict() # Track all character counts
# Get the total count of every unique character in the list
for i, c in enumerate(arr):
if c in totals:
totals[c] += 1
else:
totals[c] = 1
for i, c in enumerate(arr):
total = totals[c] # Get total character count
# Count how many have been seen so far during the second traversal
if c in freqs:
freqs[c] += 1
else:
freqs[c] = 1
out.append(c + str(freqs[c]) if total > 1 else c)
如果元素的顺序不重要,则此解决方案最简洁(尽管可能不是最快的):
刚刚发现并删除了评论@Lemoni保持元素的顺序很重要吗?@Vlemastre,是的,这不是我写的,但可能是因为这将是
O(n^2)
@AndrejKesely谢谢!我现在添加了更高效的版本。arr.count
是一个O(n)操作,必须对数组中的每个唯一元素执行,因此这仍然是O(n^2)最坏情况的复杂性(其中数组包含所有不同的项)。您可以通过enumerate(arr)中i,c的来简化:
@Alexander好的,我想我终于把它修好了。你觉得怎么样?@StardustGogeta,谢谢。认可的。仅供参考,我已经用一些代码编辑了这个问题。如果您有任何意见或建议,请予以评论。
from collections import Counter
l = ['x', 'y', 'z', 'x', 'z']
count = Counter(l)
current = {k:1 for k in l}
new_l = []
for val in l:
data = val
# Do we have a duplicate?
if count[val] > 1:
data += str(current[val])
current[val] += 1
new_l.append(data)
print(new_l)
# ['x1', 'y', 'z1', 'x2', 'z2']
arr = ['x', 'y', 'z', 'x', 'z']
out = []
for i in range(len(arr)):
c = arr[i]
if arr.count(c) > 1:
out.append(c + str(arr[:i].count(c) + 1))
else:
out.append(c)
>>> out
['x1', 'y', 'z1', 'x2', 'z2']
arr = ['x', 'y', 'z', 'x', 'z']
out = []
totals = dict()
freqs = dict() # Track all character counts
# Get the total count of every unique character in the list
for i, c in enumerate(arr):
if c in totals:
totals[c] += 1
else:
totals[c] = 1
for i, c in enumerate(arr):
total = totals[c] # Get total character count
# Count how many have been seen so far during the second traversal
if c in freqs:
freqs[c] += 1
else:
freqs[c] = 1
out.append(c + str(freqs[c]) if total > 1 else c)
>>> out
['x1', 'y', 'z1', 'x2', 'z2']
# We use np.unique to get the unique elements and their number of occurences
counts = np.unique(l,return_counts=True)
# We use a double list comprehension to get the expected result
["{}_{}".format(x, z) if y>1 else x for x, y in zip(counts[0], counts[1])
for z in range(1, y+1)]