Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/extjs/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在Python中,集合作为缓冲区计数吗?_Python_Data Structures - Fatal编程技术网

在Python中,集合作为缓冲区计数吗?

在Python中,集合作为缓冲区计数吗?,python,data-structures,Python,Data Structures,我正在破解编码面试(第四版),其中一个问题如下: 设计算法并编写代码以删除字符串中的重复字符 不使用任何额外的缓冲区。注:一个或两个附加变量即可。 阵列的额外副本不可用 我编写了以下解决方案,它满足了作者指定的所有测试用例: def remove_duplicate(s): return ''.join(sorted(set(s))) print(remove_duplicate("abcd")) // output "abcd" print(remove_duplicate("aaa

我正在破解编码面试(第四版),其中一个问题如下:

设计算法并编写代码以删除字符串中的重复字符 不使用任何额外的缓冲区。注:一个或两个附加变量即可。 阵列的额外副本不可用

我编写了以下解决方案,它满足了作者指定的所有测试用例:

def remove_duplicate(s):
    return ''.join(sorted(set(s)))

print(remove_duplicate("abcd")) // output "abcd"
print(remove_duplicate("aaaa")) // output "a"
print(remove_duplicate("")) // output ""
print(remove_duplicate("aabb")) // output "ab"
我在解决方案中使用集合是否算作使用了额外的缓冲区,或者我的解决方案是否足够?如果我的解决方案不充分,那么还有什么更好的办法


多谢各位

只有管理问题或评估答案的人才能肯定,但我要说的是,一个集合确实可以算作缓冲区

如果字符串中没有重复字符,则集合的长度将等于字符串的长度。事实上,由于集合有很大的开销,因为它在哈希列表上工作,所以集合可能比字符串占用更多的内存。如果字符串使用Unicode,则唯一字符的数量可能非常大


如果不知道字符串中有多少个唯一字符,则无法预测集合的长度。集合的长度可能很长,而且可能不可预测,这使得它可以算作缓冲区——或者更糟,因为它可能比字符串长。

为了跟进v.coder的评论,我重写了他(或她)在Python中引用的代码,并添加了一些注释,试图解释发生了什么

def removeduplicates(s):
    """Original java implementation by
          Druv Gairola (http://stackoverflow.com/users/495545/dhruv-gairola)
       in his/her answer
          http://stackoverflow.com/questions/2598129/function-to-remove-duplicate-characters-in-a-string/10473835#10473835
      """
    # python strings are immutable, so first converting the string to a list of integers,
    # each integer representing the ascii value of the letter
    # (hint: look up "ascii table" on the web)
    L = [ord(char) for char in s]

    # easiest solution is to use a set, but to use Druv Gairola's method...
    # (hint, look up "bitmaps" on the web to learn more!)
    bitmap = 0
    #seen = set()

    for index, char in enumerate(L):
        # first check for duplicates:
        # number of bits to shift left (the space is the "lowest"
        # character on the ascii table, and 'char' here is the position
        # of the current character in the ascii table. so if 'char' is
        # a space, the shift length will be 0, if 'char' is '!', shift
        # length will be 1, and so on. This naturally requires the
        # integer to actually have as many "bit positions" as there are
        # characters in the ascii table from the space to the ~,
        # but python uses "very big integers" (BigNums? I am not really
        # sure here..) - so that's probably going to be fine..
        shift_length = char - ord(' ')

        # make a new integer where only one bit is set;
        # the bit position the character corresponds to
        bit_position = 1 << shift_length

        # if the same bit is already set [to 1] in the bitmap,
        # the result of AND'ing the two integers together
        # will be an integer where that only that exact bit is
        # set - but that still means that the integer will be greater
        # than zero. (assuming that the so-called "sign bit" of the
        # integer doesn't get set. Again, I am not entirely sure about
        # how python handles integers this big internally.. but it
        # seems to work fine...)
        bit_position_already_occupied = bitmap & bit_position > 0

        if bit_position_already_occupied:
        #if char in seen:
            L[index] = 0
        else:
            # update the bitmap to indicate that this character
            # is now seen.
            # so, same procedure as above. first find the bit position
            # this character represents...
            bit_position = char - ord(' ')

            # make an integer that has a single bit set:
            # the bit that corresponds to the position of the character
            integer = 1 << bit_position

            # "add" the bit to the bitmap. The way we do this is that
            # we OR the current bitmap with the integer that has the
            # required bit set to 1. The result of OR'ing two integers
            # is that all bits that are set to 1 in *either* of the two
            # will be set to 1 in the result.

            bitmap = bitmap | integer
            #seen.add(char)

    # finally, turn the list back to a string to be able to return it
    # (again, just kind of a way to "get around" immutable python strings)
    return ''.join(chr(i) for i in L if i != 0)


if __name__ == "__main__":
    print(removeduplicates('aaaa'))
    print(removeduplicates('aabcdee'))
    print(removeduplicates('aabbccddeeefffff'))
    print(removeduplicates('&%!%)(FNAFNZEFafaei515151iaaogh6161626)([][][   ao8faeo~~~````%!)"%fakfzzqqfaklnz'))
def移除的副本:
“”“原始java实现由
德鲁夫·盖罗拉(http://stackoverflow.com/users/495545/dhruv-gairola)
在他/她的回答中
http://stackoverflow.com/questions/2598129/function-to-remove-duplicate-characters-in-a-string/10473835#10473835
"""
#python字符串是不可变的,因此首先将字符串转换为整数列表,
#表示字母ascii值的每个整数
#(提示:在web上查找“ascii表”)
L=[s中字符的ord(字符)]
#最简单的解决方法是使用集合,但使用德鲁夫·盖罗拉的方法。。。
#(提示,请在web上查找“位图”以了解更多信息!)
位图=0
#seen=set()
对于索引,枚举中的字符(L):
#首先检查重复项:
#要向左移位的位数(空间是“最低的”
#ascii表上的字符,此处的“char”是位置
#ascii表中当前字符的。因此如果“char”是
#一个空格,移位长度为0,如果“char”为“!”,则移位
#长度将为1,依此类推。这自然需要
#整数的实际“位位置”与实际的“位位置”相同
#ascii表中从空格到~的字符,
#但是python使用“非常大的整数”(BigNums?我不是真的
#当然在这里…)(所以那可能没问题。。
移位长度=字符-顺序(“”)
#在只设置了一位的情况下生成一个新整数;
#字符对应的位位置
位_位置=1 0
如果位位置已被占用:
#如果看到字符:
L[索引]=0
其他:
#更新位图以指示此字符
#现在可以看到。
#所以,步骤同上。首先找到位的位置
#这个字符代表。。。
位位置=字符-顺序(“”)
#生成一个设置了单个位的整数:
#与字符位置相对应的位

integer=1
set
有助于获取唯一的项目。但即使顺序重要,
sorted(set)(s))
也不会返回初始顺序。例如,join(sorted(set('abcfbcdd'))
给出了
abcdf
,尽管初始顺序是
abcfd
是,但“set”将作为附加缓冲区计算。请参见O(n)来自@Dhruv Gairola的答案中的解决方案