Python 在sum列上拆分数据帧?

Python 在sum列上拆分数据帧?,python,numpy,pandas,Python,Numpy,Pandas,我有这样一个数据帧: >>> df = pd.DataFrame({'Foo': [1, 2, 3, 6], 'Bar': ['hello', 'world', 'spam', 'eggs']}) >>> df Bar Foo 0 hello 1 1 world 2 2 spam 3 3 eggs 6 如何拆分此数据帧,使每个拆分部分具有(大致)相同的Foo?也就是说,如果我想把它们一分为二,我想:

我有这样一个数据帧:

>>> df = pd.DataFrame({'Foo': [1, 2, 3, 6], 'Bar': ['hello', 'world', 'spam', 'eggs']})
>>> df
     Bar  Foo
0  hello    1
1  world    2
2   spam    3
3   eggs    6
如何拆分此数据帧,使每个拆分部分具有(大致)相同的
Foo
?也就是说,如果我想把它们一分为二,我想:

     Bar  Foo
0  hello    1
1  world    2
2   spam    3

因为在这两种情况下,
Foo
总和为6

我知道有NumPy,即
pd.np.array\u split(df,2)
,但这会将数据帧分割成具有相等行的部分。如何才能做到同样的效果,但对特定列使用相等的和?

您可以使用,然后对该列进行筛选。例如:

>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> df
     Bar  Foo  Foo_cumsum
0  hello    1           1
1  world    2           3
2   spam    3           6
3   eggs    6          12

>>> df[(df.Foo_cumsum > 0) & (df.Foo_cumsum <= 6)]
     Bar  Foo  Foo_cumsum
0  hello    1           1
1  world    2           3
2   spam    3           6
>>> df[(df.Foo_cumsum > 6) & (df.Foo_cumsum <= 12)]
    Bar  Foo  Foo_cumsum
3  eggs    6          12
df['Foo_cumsum']=df.Foo.cumsum() >>>df 巴福福库姆酒店 0你好1 1世界2 3 2.3.6 3个鸡蛋6 12 >>>df[(df.Foo_cumsum>0)和(df.Foo_cumsum>>df[(df.Foo_cumsum>6)和(df.Foo_cumsum您可以在该列上使用并过滤。示例:

>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> df
     Bar  Foo  Foo_cumsum
0  hello    1           1
1  world    2           3
2   spam    3           6
3   eggs    6          12

>>> df[(df.Foo_cumsum > 0) & (df.Foo_cumsum <= 6)]
     Bar  Foo  Foo_cumsum
0  hello    1           1
1  world    2           3
2   spam    3           6
>>> df[(df.Foo_cumsum > 6) & (df.Foo_cumsum <= 12)]
    Bar  Foo  Foo_cumsum
3  eggs    6          12
df['Foo_cumsum']=df.Foo.cumsum() >>>df 巴福福库姆酒店 0你好1 1世界2 3 2.3.6 3个鸡蛋6 12 >>>df[(df.Foo_cumsum>0)和(df.Foo_cumsum>>df[(df.Foo_cumsum>6)和(df.Foo_cumsum您可以在该列上使用并过滤。示例:

>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> df
     Bar  Foo  Foo_cumsum
0  hello    1           1
1  world    2           3
2   spam    3           6
3   eggs    6          12

>>> df[(df.Foo_cumsum > 0) & (df.Foo_cumsum <= 6)]
     Bar  Foo  Foo_cumsum
0  hello    1           1
1  world    2           3
2   spam    3           6
>>> df[(df.Foo_cumsum > 6) & (df.Foo_cumsum <= 12)]
    Bar  Foo  Foo_cumsum
3  eggs    6          12
df['Foo_cumsum']=df.Foo.cumsum() >>>df 巴福福库姆酒店 0你好1 1世界2 3 2.3.6 3个鸡蛋6 12 >>>df[(df.Foo_cumsum>0)和(df.Foo_cumsum>>df[(df.Foo_cumsum>6)和(df.Foo_cumsum您可以在该列上使用并过滤。示例:

>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> df
     Bar  Foo  Foo_cumsum
0  hello    1           1
1  world    2           3
2   spam    3           6
3   eggs    6          12

>>> df[(df.Foo_cumsum > 0) & (df.Foo_cumsum <= 6)]
     Bar  Foo  Foo_cumsum
0  hello    1           1
1  world    2           3
2   spam    3           6
>>> df[(df.Foo_cumsum > 6) & (df.Foo_cumsum <= 12)]
    Bar  Foo  Foo_cumsum
3  eggs    6          12
df['Foo_cumsum']=df.Foo.cumsum() >>>df 巴福福库姆酒店 0你好1 1世界2 3 2.3.6 3个鸡蛋6 12
>>>df[(df.Foo_cumsum>0)和(df.Foo_cumsum>>df[(df.Foo_cumsum>6)和(df.Foo_cumsum通过@congusbongus改进解决方案

>>> import pandas as pd
>>> df = pd.DataFrame({'Foo': [1, 2, 3, 6], 'Bar': ['hello', 'world', 'spam', 'eggs']})
>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> import math
>>> no_buckets = 4
>>> bucket_size = df.Foo_cumsum.max() / no_buckets
>>> df['bucket'] = (df.Foo_cumsum / bucket_size).apply(math.ceil)
>>> df
     Bar  Foo  Foo_cumsum  bucket
0  hello    1           1       1
1  world    2           3       1
2   spam    3           6       2

在变量
no_bucket

中相应更改所需的bucket数量,通过@congusbongus改进解决方案

>>> import pandas as pd
>>> df = pd.DataFrame({'Foo': [1, 2, 3, 6], 'Bar': ['hello', 'world', 'spam', 'eggs']})
>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> import math
>>> no_buckets = 4
>>> bucket_size = df.Foo_cumsum.max() / no_buckets
>>> df['bucket'] = (df.Foo_cumsum / bucket_size).apply(math.ceil)
>>> df
     Bar  Foo  Foo_cumsum  bucket
0  hello    1           1       1
1  world    2           3       1
2   spam    3           6       2

在变量
no_bucket

中相应更改所需的bucket数量,通过@congusbongus改进解决方案

>>> import pandas as pd
>>> df = pd.DataFrame({'Foo': [1, 2, 3, 6], 'Bar': ['hello', 'world', 'spam', 'eggs']})
>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> import math
>>> no_buckets = 4
>>> bucket_size = df.Foo_cumsum.max() / no_buckets
>>> df['bucket'] = (df.Foo_cumsum / bucket_size).apply(math.ceil)
>>> df
     Bar  Foo  Foo_cumsum  bucket
0  hello    1           1       1
1  world    2           3       1
2   spam    3           6       2

在变量
no_bucket

中相应更改所需的bucket数量,通过@congusbongus改进解决方案

>>> import pandas as pd
>>> df = pd.DataFrame({'Foo': [1, 2, 3, 6], 'Bar': ['hello', 'world', 'spam', 'eggs']})
>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> import math
>>> no_buckets = 4
>>> bucket_size = df.Foo_cumsum.max() / no_buckets
>>> df['bucket'] = (df.Foo_cumsum / bucket_size).apply(math.ceil)
>>> df
     Bar  Foo  Foo_cumsum  bucket
0  hello    1           1       1
1  world    2           3       1
2   spam    3           6       2
在变量
no_bucket