大于或等于python中的binning_Python_Python 3.x_Pandas_Numpy

大于或等于python中的binning

python python-3.x pandas numpy

大于或等于python中的binning,python,python-3.x,pandas,numpy,Python,Python 3.x,Pandas,Numpy,我有一个超过100万条的巨大列表我的垃圾箱大小[0,1,2,3…..1000] 因此，对于0箱大小，所有大于1m的条目都将通过，以此类推我需要一个快速的解决方案，我试图编码它，但它是相当缓慢的感谢您的帮助。谢谢 Input- input_list = [0,0,0,1,2,3,55,34,......] (almost 1m in Len) bins = [0,1,2,....., 1000] Output- {0:1.00, 1:0.99, 2:998........1000:0.02

我有一个超过100万条的巨大列表

我的垃圾箱大小[0,1,2,3…..1000]

因此，对于0箱大小，所有大于1m的条目都将通过，以此类推

我需要一个快速的解决方案，我试图编码它，但它是相当缓慢的

感谢您的帮助。谢谢

Input-
input_list = [0,0,0,1,2,3,55,34,......] (almost 1m in Len)
bins = [0,1,2,....., 1000]

Output-
{0:1.00, 1:0.99, 2:998........1000:0.02}
where key is bin,
      value is ratio of values greater than or equal to particular bin to total entries in list.

如果我正确理解您的问题，您可以使用

numpy.histogram

。如果您在自己的

input_列表

和

bin

中替换，则应使用以下代码块：

import numpy as np

# Filling in dummy data
input_list = [np.random.randint(low=0, high=100) for i in range(100)]

# Setup bins as [1, 2, 3, ... 100]
bins = [i for i in range(1, 101)]

# Run numpy.histogram
hist, bin_edges = np.histogram(input_list, bins=bins)

# Find cumulative sum
cumsum = np.array([sum(hist[:i]) for i in range(len(hist))])

# Find ratios
ratios = (len(data) - cumsum) / len(data)

比率

变量包含您要查找的内容，即大于或等于特定bin的值的比率。

一种非常简单的方法：计算大于元素的元素数，然后除以记录数

import numpy as np

data = np.random.randint(2000, size=10**6)
bins = np.arange(1000) 
dic = {}
for bi in bins:
    dic[bi] = np.count_nonzero(data>=bi)/len(data)