Python 将数据放入范围存储桶中
我将二维数据存储在元组的排序列表中,如下所示:Python 将数据放入范围存储桶中,python,Python,我将二维数据存储在元组的排序列表中,如下所示: data = [(0.1,100), (0.13,300), (0.2,10)... buckets = [((0,0.14), 2), ((0.135,0.19), 1), ((0.19,0.21), 2), ((0.19,0.24), 3)... 每个元组中的第一个值X值对于元组列表只出现一次。换句话说,0.1等只能有一个值 然后,我有一个桶的排序列表。bucket定义为包含范围和id的元组,如下所示: data = [(0.1,100),
data = [(0.1,100), (0.13,300), (0.2,10)...
buckets = [((0,0.14), 2), ((0.135,0.19), 1), ((0.19,0.21), 2), ((0.19,0.24), 3)...
每个元组中的第一个值X值对于元组列表只出现一次。换句话说,0.1等只能有一个值
然后,我有一个桶的排序列表。bucket定义为包含范围和id的元组,如下所示:
data = [(0.1,100), (0.13,300), (0.2,10)...
buckets = [((0,0.14), 2), ((0.135,0.19), 1), ((0.19,0.21), 2), ((0.19,0.24), 3)...
范围是相对于X轴的。因此,id 2上面有两个bucket,ids1和3分别只有一个bucket。id 2的第一个铲斗的范围为0到0.14。请注意,桶可以重叠
因此,我需要一种算法,将数据放入存储桶中,然后将分数相加。对于上述数据,结果将是:
1:0
2:410
3:10
请注意,每个数据段是如何被一个与ID2关联的bucket捕获的,因此它获得分数100+300+10=410
我如何编写算法来执行此操作?尝试此代码:
data = [(0.1,100), (0.13,300), (0.2,10)]
buckets = [((0,0.14), 2), ((0.135,0.19), 1), ((0.19,0.21), 2), ((0.19,0.24), 3)]
def foo(tpl): ## determine the buckets a data-tuple is enclosed by list of IDs
x, s = tpl
lst = []
for bucket in buckets:
rnge, iid = bucket
if x>rnge[0] and x<rnge[1]: lst.append(iid)
return lst
data = [[dt, foo(dt)] for dt in data]
scores_dict = {}
for tpl in data:
score = tpl[0][1]
for iid in tpl[1]:
if iid in scores_dict: scores_dict[iid]+=score
else: scores_dict[iid] =score
for key in scores_dict:
print key,":",scores_dict[key]
如果未打印任何bucket ID,则该bucket中没有X值或其总和为零。将每个bucket定义(标签范围)转换为可调用的,给定数据元组,将增加bucket总数。Bucket值存储在一个简单的dict中。如果您想提供一个更简单的api,您可以轻松地将这个概念封装在一个类中
def partition(buckets, bucket_definition):
"""Build a callable that increments the appropriate buckets with a value"""
lower, upper = bucket_definition[0]
key = bucket_definition[1]
def _partition(data):
x, y = data
# Set a default value for this key
buckets.setdefault(key, 0)
if lower <= x <= upper:
buckets[key] += y
return _partition
bucket_definitions = [
((0, 0.14), 2),
((0.135, 0.19), 1),
((0.19, 0.21), 2),
((0.19, 0.24), 3)
]
data = [(0.1, 100), (0.13, 300), (0.2, 10)]
# Holder for bucket labels and values
buckets = {}
# For each bucket definition (range, label) build a callable
partitioners = [partition(buckets, definition) for definition in bucket_definitions]
# Map each callable to each data tuple provided
for partitioner in partitioners:
map(partitioner, data)
print(buckets)
def分区(bucket,bucket\u定义):
“”“构建一个可调用函数,用一个值递增相应的存储桶”“”
下,上=桶_定义[0]
key=bucket\u定义[1]
def_分区(数据):
x、 y=数据
#设置此键的默认值
bucket.setdefault(键,0)
如果低于,则从测试数据中产生所需的输出:
data = [(0.1,100), (0.13,300), (0.2,10)]
buckets = [((0,0.14), 2), ((0.135,0.19), 1), ((0.19,0.21), 2), ((0.19,0.24), 3)]
totals = dict()
for bucket in buckets:
bucket_id = bucket[1]
if bucket_id not in totals:
totals[bucket_id] = 0
for data_point in data:
if data_point[0] >= bucket[0][0] and data_point[0] <= bucket[0][1]:
totals[bucket_id] += data_point[1]
for key in sorted(totals):
print("{}: {}".format(key, totals[key]))
data=[(0.1100),(0.13300),(0.2,10)]
桶=[(0,0.14),2),(0.135,0.19),1),(0.19,0.21),2),((0.19,0.24),3)]
总计=dict()
对于铲斗中的铲斗:
bucket_id=bucket[1]
如果bucket_id不在总数中:
总计[bucket_id]=0
对于数据中的数据点:
如果数据点[0]>=bucket[0][0]和数据点[0],你可以做如果lower>=x>=upper:
,Python做的事情是正确的。@Rob Cowie应该是:如果lower Urgh;我责备深夜编码。我刚刚想到,你所描述的可以作为一个。读一读它是一个强调,桶可以相互重叠。如果有两个存储桶(具有相同的ID)重叠,则可能会多次计算某些数据。