Algorithm 不断更新中位数+;空间效率

Algorithm 不断更新中位数+;空间效率,algorithm,mean,space,pseudocode,median,Algorithm,Mean,Space,Pseudocode,Median,可能我没有寻找/搜索正确的关键字(我找不到解决方案) 我试图以节省空间的方式计算数字列表(不断更新)的中值 为了计算平均值,有一种很好的方法,它可以记住列表中元素的数量并对旧的平均值进行加权。例如(伪代码): 问题: 是否有类似的(节省空间的)方法来计算中值 已更新 我更新了这个问题(感谢@WillemVanOnsem)。我不仅在寻找不断更新中位数的方法,而且还在寻找一种节省空间的方法。 根据他的提示,我们可以保留两个数据结构 Example: // 1) We have a list for

可能我没有寻找/搜索正确的关键字(我找不到解决方案)

我试图以节省空间的方式计算数字列表(不断更新)的中值

为了计算平均值,有一种很好的方法,它可以记住列表中元素的数量并对旧的平均值进行加权。例如(伪代码):

问题: 是否有类似的(节省空间的)方法来计算中值

已更新 我更新了这个问题(感谢@WillemVanOnsem)。我不仅在寻找不断更新中位数的方法,而且还在寻找一种节省空间的方法。 根据他的提示,我们可以保留两个数据结构

Example:

// 1) We have a list for which we want to find the median.
noList   = [9,10,4,6,13,12]

// 2) We devide it into two list or datastructures (additionally we sort it).
smallerList = [4,6,9]
biggerList  = [10,12,13]

// 3) Both list have the same length, so the median is between the last element of smallerList und the first element of biggerList.
median = (9 + 10) / 2 = 9.5

// 4) Next, we add a further element and want to update our median.
// We add the number 5 to our datastructures. So the new list is:
noList   = [9,10,4,6,13,12,5]

// 5) Obviously 5 is smaller than our current median of 9.5. So we insert it in a sorted way into smallerList:
smallerList = [4,5,6,9]
biggerList  = [10,12,13]

// 6) Now length(smallerList) > length(biggerList), So, we know, that the updated median should be the last element of smallerList.
median = 9

// 7) Next, we add a further element and want to update our median.
// We add the number 2 to our datastructures. So the new list is:
noList   = [9,10,4,6,13,12,5,2]

// 8) Obviously 2 is smaller than our current median of 9. So we insert it again in a sorted way into smallerList:
smallerList = [2,4,5,6,9]
biggerList  = [10,12,13]

// 9) Now the length of smallerList is much bigger than the length of biggerList and we need to "balance" our list by taking one element from one list and inserting it into the other list.
// We remove the element 9 from smallerList and insert it into biggerList.
smallerList = [2,4,5,6]
biggerList  = [9,10,12,13]

// 10) Both list have the same length, so the median is between the last element of smallerList und the first element of biggerList.
median = (6 + 9) / 2 = 7.5
希望,这说明了这一点。我猜这是你的暗示(@willemvanonsen)


是的,这也许能回答我最初的问题。。。但是这个解决方案的问题是,这两个列表(smallerList和biggerList)可能会增长到相当大的规模。假设我们有一个10^18的数字流,我们希望找到所有数字的中间值,而不会失去记忆。如何以节省空间的方式解决这个问题?

如果不记住你看到的所有数字,就无法做到这一点,因为在任何时候,你过去看到的任何数字都可能成为未来的中值

如果到目前为止您已经看到了n个数字,那么对于任何i,其中最小的i可以成为中位数:

  • 如果i>n/2,那么如果下一个2i-n数字更大,就会发生这种情况


  • 如果i是,则保留两个数据结构,即较小的元素和较大的元素,这些列表应该协调一致(相同数量的元素)。如果其中一个变大了,你就把当前的中值移到另一个列表中,然后从前一个列表中选择一个。。。我更新了我最初的问题。我也在寻找一个节省空间的解决方案。可能重复我的第一印象是你必须存储所有的元素来做一个在线中位数。与平均值相反,它可能是高度非线性的。如果你限制大小,我想你可能会忘记一些。这些数字是整数吗?最大和最小可能值是多少?
    Example:
    
    // 1) We have a list for which we want to find the median.
    noList   = [9,10,4,6,13,12]
    
    // 2) We devide it into two list or datastructures (additionally we sort it).
    smallerList = [4,6,9]
    biggerList  = [10,12,13]
    
    // 3) Both list have the same length, so the median is between the last element of smallerList und the first element of biggerList.
    median = (9 + 10) / 2 = 9.5
    
    // 4) Next, we add a further element and want to update our median.
    // We add the number 5 to our datastructures. So the new list is:
    noList   = [9,10,4,6,13,12,5]
    
    // 5) Obviously 5 is smaller than our current median of 9.5. So we insert it in a sorted way into smallerList:
    smallerList = [4,5,6,9]
    biggerList  = [10,12,13]
    
    // 6) Now length(smallerList) > length(biggerList), So, we know, that the updated median should be the last element of smallerList.
    median = 9
    
    // 7) Next, we add a further element and want to update our median.
    // We add the number 2 to our datastructures. So the new list is:
    noList   = [9,10,4,6,13,12,5,2]
    
    // 8) Obviously 2 is smaller than our current median of 9. So we insert it again in a sorted way into smallerList:
    smallerList = [2,4,5,6,9]
    biggerList  = [10,12,13]
    
    // 9) Now the length of smallerList is much bigger than the length of biggerList and we need to "balance" our list by taking one element from one list and inserting it into the other list.
    // We remove the element 9 from smallerList and insert it into biggerList.
    smallerList = [2,4,5,6]
    biggerList  = [9,10,12,13]
    
    // 10) Both list have the same length, so the median is between the last element of smallerList und the first element of biggerList.
    median = (6 + 9) / 2 = 7.5