Python 如何根据值的范围对数据进行分组
我有这样的数据Python 如何根据值的范围对数据进行分组,python,python-2.7,Python,Python 2.7,我有这样的数据 [312.281, 370.401, 254.245, 272.256, 312.325, 286.243, 271.231, ...] 然后我想根据值的范围对它们进行分组 for i in data: if i in range(200,300): data_200_300.append(i) elif i in range(300,400): data_300_400.append(i) 它不工作,我应该使用什
[312.281,
370.401,
254.245,
272.256,
312.325,
286.243,
271.231, ...]
然后我想根据值的范围对它们进行分组
for i in data:
if i in range(200,300):
data_200_300.append(i)
elif i in range(300,400):
data_300_400.append(i)
它不工作,我应该使用什么代码?返回两个数字之间的整数列表,而您的数据包含浮点数。当数据包含浮点数时,可以直接使用using>和返回两个数字之间的整数列表。您可以直接使用using>和@AKS正确地回答它,作为一种替代方法,您也可以使用类似于这样的lambda表达式进行尝试
result = filter(lambda x: 200 < x < 300, data)
你可以用它来处理你的数据
filtered_data = []
for i in range(200,400,100):
filtered_data.append( filter(lambda x: i < x < i+100, data) )
>>> filtered_data
[[254.245, 272.256, 286.243, 271.231], [312.281, 370.401, 312.325]]
@AKS的回答是正确的,作为替代,您也可以用lambda表达式进行类似的尝试
result = filter(lambda x: 200 < x < 300, data)
你可以用它来处理你的数据
filtered_data = []
for i in range(200,400,100):
filtered_data.append( filter(lambda x: i < x < i+100, data) )
>>> filtered_data
[[254.245, 272.256, 286.243, 271.231], [312.281, 370.401, 312.325]]
如果您有很多这样的值,并且可以导入numpy,那么有一个比If条件字符串或lambda过滤器更快的选项。它使用的是逻辑索引:
def indexingversion(data, bin_start, bin_end, bin_step):
x = np.array(data)
bin_edges = np.arange(bin_start, bin_end + bin_step, bin_step)
bin_number = bin_edges.size - 1
cond = np.zeros((x.size, bin_number), dtype=bool)
for i in range(bin_number):
cond[:, i] = np.logical_and(bin_edges[i] < x,
x < bin_edges[i+1])
return [list(x[cond[:, i]]) for i in range(bin_number)]
分析输出:
All the same? - True
Wrote profile results to bla.py.lprof
Timer unit: 1e-06 s
Total time: 0.580098 s
File: bla.py
Function: run_all at line 32
Line # Hits Time Per Hit % Time Line Contents
==============================================================
32 @profile
33 def run_all():
34 1 1 1.0 0.0 n = 100000
35 1 3311 3311.0 0.6 x = np.random.random_integers(200, 400, n) + np.random.ranf(n)
36 1 2 2.0 0.0 bin_start = 200
37 1 1 1.0 0.0 bin_end = 400
38 1 0 0.0 0.0 bin_step = 100
39 1 263073 263073.0 45.3 a = forloop(x)
40 1 301819 301819.0 52.0 b = lambdaversion(x, bin_start, bin_end, bin_step)
41 1 7514 7514.0 1.3 c = indexingversion(x, bin_start, bin_end, bin_step)
42 1 4377 4377.0 0.8 print('All the same? - ' + str(a == b == c))
正如您在时间或%时间列中所看到的,numpy索引大约快40或50倍,至少对于100000个数字来说。但是,对于非常少的值,在我的机器上速度较慢,在大约40个值时开始速度会更快。如果您有很多这样的值,并且可以导入numpy,则有一个比一系列If条件或lambda过滤器更快的选项。它使用的是逻辑索引:
def indexingversion(data, bin_start, bin_end, bin_step):
x = np.array(data)
bin_edges = np.arange(bin_start, bin_end + bin_step, bin_step)
bin_number = bin_edges.size - 1
cond = np.zeros((x.size, bin_number), dtype=bool)
for i in range(bin_number):
cond[:, i] = np.logical_and(bin_edges[i] < x,
x < bin_edges[i+1])
return [list(x[cond[:, i]]) for i in range(bin_number)]
分析输出:
All the same? - True
Wrote profile results to bla.py.lprof
Timer unit: 1e-06 s
Total time: 0.580098 s
File: bla.py
Function: run_all at line 32
Line # Hits Time Per Hit % Time Line Contents
==============================================================
32 @profile
33 def run_all():
34 1 1 1.0 0.0 n = 100000
35 1 3311 3311.0 0.6 x = np.random.random_integers(200, 400, n) + np.random.ranf(n)
36 1 2 2.0 0.0 bin_start = 200
37 1 1 1.0 0.0 bin_end = 400
38 1 0 0.0 0.0 bin_step = 100
39 1 263073 263073.0 45.3 a = forloop(x)
40 1 301819 301819.0 52.0 b = lambdaversion(x, bin_start, bin_end, bin_step)
41 1 7514 7514.0 1.3 c = indexingversion(x, bin_start, bin_end, bin_step)
42 1 4377 4377.0 0.8 print('All the same? - ' + str(a == b == c))
正如您在时间或%时间列中所看到的,numpy索引大约快40或50倍,至少对于100000个数字来说。但是,对于非常少量的值,在我的机器上速度较慢,在大约40个值时开始速度更快。如果我想在列中分组,如df=[id,v1,v2,v3 1,12,32,23 2,65,45,22 3,55,34,76…],如果我想基于v3 colunn进行分组,我应该怎么做?如果我想在列中分组,比如df=[id,v1,v2,v3 1,12,32,23 2,65,45,22 3,55,34,76…]如果我想基于v3 colunn进行分组,我应该怎么做?