在Python中，集合作为缓冲区计数吗？_Python_Data Structures

在Python中，集合作为缓冲区计数吗？

python data-structures

在Python中，集合作为缓冲区计数吗？,python,data-structures,Python,Data Structures,我正在破解编码面试（第四版），其中一个问题如下：设计算法并编写代码以删除字符串中的重复字符不使用任何额外的缓冲区。注：一个或两个附加变量即可。阵列的额外副本不可用我编写了以下解决方案，它满足了作者指定的所有测试用例： def remove_duplicate(s): return ''.join(sorted(set(s))) print(remove_duplicate("abcd")) // output "abcd" print(remove_duplicate("aaa

我正在破解编码面试（第四版），其中一个问题如下：

设计算法并编写代码以删除字符串中的重复字符不使用任何额外的缓冲区。注：一个或两个附加变量即可。阵列的额外副本不可用

我编写了以下解决方案，它满足了作者指定的所有测试用例：

def remove_duplicate(s):
    return ''.join(sorted(set(s)))

print(remove_duplicate("abcd")) // output "abcd"
print(remove_duplicate("aaaa")) // output "a"
print(remove_duplicate("")) // output ""
print(remove_duplicate("aabb")) // output "ab"

我在解决方案中使用集合是否算作使用了额外的缓冲区，或者我的解决方案是否足够？如果我的解决方案不充分，那么还有什么更好的办法

多谢各位

只有管理问题或评估答案的人才能肯定，但我要说的是，一个集合确实可以算作缓冲区

如果字符串中没有重复字符，则集合的长度将等于字符串的长度。事实上，由于集合有很大的开销，因为它在哈希列表上工作，所以集合可能比字符串占用更多的内存。如果字符串使用Unicode，则唯一字符的数量可能非常大

如果不知道字符串中有多少个唯一字符，则无法预测集合的长度。集合的长度可能很长，而且可能不可预测，这使得它可以算作缓冲区——或者更糟，因为它可能比字符串长。

为了跟进v.coder的评论，我重写了他（或她）在Python中引用的代码，并添加了一些注释，试图解释发生了什么

def removeduplicates(s):
    """Original java implementation by
          Druv Gairola (http://stackoverflow.com/users/495545/dhruv-gairola)
       in his/her answer
          http://stackoverflow.com/questions/2598129/function-to-remove-duplicate-characters-in-a-string/10473835#10473835
      """
    # python strings are immutable, so first converting the string to a list of integers,
    # each integer representing the ascii value of the letter
    # (hint: look up "ascii table" on the web)
    L = [ord(char) for char in s]

    # easiest solution is to use a set, but to use Druv Gairola's method...
    # (hint, look up "bitmaps" on the web to learn more!)
    bitmap = 0
    #seen = set()

    for index, char in enumerate(L):
        # first check for duplicates:
        # number of bits to shift left (the space is the "lowest"
        # character on the ascii table, and 'char' here is the position
        # of the current character in the ascii table. so if 'char' is
        # a space, the shift length will be 0, if 'char' is '!', shift
        # length will be 1, and so on. This naturally requires the
        # integer to actually have as many "bit positions" as there are
        # characters in the ascii table from the space to the ~,
        # but python uses "very big integers" (BigNums? I am not really
        # sure here..) - so that's probably going to be fine..
        shift_length = char - ord(' ')

        # make a new integer where only one bit is set;
        # the bit position the character corresponds to
        bit_position = 1 << shift_length

        # if the same bit is already set [to 1] in the bitmap,
        # the result of AND'ing the two integers together
        # will be an integer where that only that exact bit is
        # set - but that still means that the integer will be greater
        # than zero. (assuming that the so-called "sign bit" of the
        # integer doesn't get set. Again, I am not entirely sure about
        # how python handles integers this big internally.. but it
        # seems to work fine...)
        bit_position_already_occupied = bitmap & bit_position > 0

        if bit_position_already_occupied:
        #if char in seen:
            L[index] = 0
        else:
            # update the bitmap to indicate that this character
            # is now seen.
            # so, same procedure as above. first find the bit position
            # this character represents...
            bit_position = char - ord(' ')

            # make an integer that has a single bit set:
            # the bit that corresponds to the position of the character
            integer = 1 << bit_position

            # "add" the bit to the bitmap. The way we do this is that
            # we OR the current bitmap with the integer that has the
            # required bit set to 1. The result of OR'ing two integers
            # is that all bits that are set to 1 in *either* of the two
            # will be set to 1 in the result.

            bitmap = bitmap | integer
            #seen.add(char)

    # finally, turn the list back to a string to be able to return it
    # (again, just kind of a way to "get around" immutable python strings)
    return ''.join(chr(i) for i in L if i != 0)


if __name__ == "__main__":
    print(removeduplicates('aaaa'))
    print(removeduplicates('aabcdee'))
    print(removeduplicates('aabbccddeeefffff'))
    print(removeduplicates('&%!%)(FNAFNZEFafaei515151iaaogh6161626)([][][   ao8faeo~~~````%!)"%fakfzzqqfaklnz'))

def移除的副本：
“”“原始java实现由
德鲁夫·盖罗拉(http://stackoverflow.com/users/495545/dhruv-gairola)
在他/她的回答中
http://stackoverflow.com/questions/2598129/function-to-remove-duplicate-characters-in-a-string/10473835#10473835
"""
#python字符串是不可变的，因此首先将字符串转换为整数列表，
#表示字母ascii值的每个整数
#（提示：在web上查找“ascii表”）
L=[s中字符的ord（字符）]
#最简单的解决方法是使用集合，但使用德鲁夫·盖罗拉的方法。。。
#（提示，请在web上查找“位图”以了解更多信息！）
位图=0
#seen=set（）
对于索引，枚举中的字符（L）：
#首先检查重复项：
#要向左移位的位数（空间是“最低的”
#ascii表上的字符，此处的“char”是位置
#ascii表中当前字符的。因此如果“char”是
#一个空格，移位长度为0，如果“char”为“！”，则移位
#长度将为1，依此类推。这自然需要
#整数的实际“位位置”与实际的“位位置”相同
#ascii表中从空格到~的字符，
#但是python使用“非常大的整数”（BigNums？我不是真的
#当然在这里…）（所以那可能没问题。。
移位长度=字符-顺序（“”）
#在只设置了一位的情况下生成一个新整数；
#字符对应的位位置
位_位置=1 0
如果位位置已被占用：
#如果看到字符：
L[索引]=0
其他：
#更新位图以指示此字符
#现在可以看到。
#所以，步骤同上。首先找到位的位置
#这个字符代表。。。
位位置=字符-顺序（“”）
#生成一个设置了单个位的整数：
#与字符位置相对应的位
integer=1set
有助于获取唯一的项目。但即使顺序重要，sorted（set）（s））
也不会返回初始顺序。例如，join（sorted（set（'abcfbcdd'））

给出了

abcdf

，尽管初始顺序是

abcfd

是，但“set”将作为附加缓冲区计算。请参见O（n）来自@Dhruv Gairola的答案中的解决方案