Python 在sum列上拆分数据帧?
我有这样一个数据帧:Python 在sum列上拆分数据帧?,python,numpy,pandas,Python,Numpy,Pandas,我有这样一个数据帧: >>> df = pd.DataFrame({'Foo': [1, 2, 3, 6], 'Bar': ['hello', 'world', 'spam', 'eggs']}) >>> df Bar Foo 0 hello 1 1 world 2 2 spam 3 3 eggs 6 如何拆分此数据帧,使每个拆分部分具有(大致)相同的Foo?也就是说,如果我想把它们一分为二,我想:
>>> df = pd.DataFrame({'Foo': [1, 2, 3, 6], 'Bar': ['hello', 'world', 'spam', 'eggs']})
>>> df
Bar Foo
0 hello 1
1 world 2
2 spam 3
3 eggs 6
如何拆分此数据帧,使每个拆分部分具有(大致)相同的Foo
?也就是说,如果我想把它们一分为二,我想:
Bar Foo
0 hello 1
1 world 2
2 spam 3
及
因为在这两种情况下,Foo
总和为6
我知道有NumPy,即pd.np.array\u split(df,2)
,但这会将数据帧分割成具有相等行的部分。如何才能做到同样的效果,但对特定列使用相等的和?您可以使用,然后对该列进行筛选。例如:
>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> df
Bar Foo Foo_cumsum
0 hello 1 1
1 world 2 3
2 spam 3 6
3 eggs 6 12
>>> df[(df.Foo_cumsum > 0) & (df.Foo_cumsum <= 6)]
Bar Foo Foo_cumsum
0 hello 1 1
1 world 2 3
2 spam 3 6
>>> df[(df.Foo_cumsum > 6) & (df.Foo_cumsum <= 12)]
Bar Foo Foo_cumsum
3 eggs 6 12
df['Foo_cumsum']=df.Foo.cumsum()
>>>df
巴福福库姆酒店
0你好1
1世界2 3
2.3.6
3个鸡蛋6 12
>>>df[(df.Foo_cumsum>0)和(df.Foo_cumsum>>df[(df.Foo_cumsum>6)和(df.Foo_cumsum您可以在该列上使用并过滤。示例:
>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> df
Bar Foo Foo_cumsum
0 hello 1 1
1 world 2 3
2 spam 3 6
3 eggs 6 12
>>> df[(df.Foo_cumsum > 0) & (df.Foo_cumsum <= 6)]
Bar Foo Foo_cumsum
0 hello 1 1
1 world 2 3
2 spam 3 6
>>> df[(df.Foo_cumsum > 6) & (df.Foo_cumsum <= 12)]
Bar Foo Foo_cumsum
3 eggs 6 12
df['Foo_cumsum']=df.Foo.cumsum()
>>>df
巴福福库姆酒店
0你好1
1世界2 3
2.3.6
3个鸡蛋6 12
>>>df[(df.Foo_cumsum>0)和(df.Foo_cumsum>>df[(df.Foo_cumsum>6)和(df.Foo_cumsum您可以在该列上使用并过滤。示例:
>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> df
Bar Foo Foo_cumsum
0 hello 1 1
1 world 2 3
2 spam 3 6
3 eggs 6 12
>>> df[(df.Foo_cumsum > 0) & (df.Foo_cumsum <= 6)]
Bar Foo Foo_cumsum
0 hello 1 1
1 world 2 3
2 spam 3 6
>>> df[(df.Foo_cumsum > 6) & (df.Foo_cumsum <= 12)]
Bar Foo Foo_cumsum
3 eggs 6 12
df['Foo_cumsum']=df.Foo.cumsum()
>>>df
巴福福库姆酒店
0你好1
1世界2 3
2.3.6
3个鸡蛋6 12
>>>df[(df.Foo_cumsum>0)和(df.Foo_cumsum>>df[(df.Foo_cumsum>6)和(df.Foo_cumsum您可以在该列上使用并过滤。示例:
>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> df
Bar Foo Foo_cumsum
0 hello 1 1
1 world 2 3
2 spam 3 6
3 eggs 6 12
>>> df[(df.Foo_cumsum > 0) & (df.Foo_cumsum <= 6)]
Bar Foo Foo_cumsum
0 hello 1 1
1 world 2 3
2 spam 3 6
>>> df[(df.Foo_cumsum > 6) & (df.Foo_cumsum <= 12)]
Bar Foo Foo_cumsum
3 eggs 6 12
df['Foo_cumsum']=df.Foo.cumsum()
>>>df
巴福福库姆酒店
0你好1
1世界2 3
2.3.6
3个鸡蛋6 12
>>>df[(df.Foo_cumsum>0)和(df.Foo_cumsum>>df[(df.Foo_cumsum>6)和(df.Foo_cumsum通过@congusbongus改进解决方案
>>> import pandas as pd
>>> df = pd.DataFrame({'Foo': [1, 2, 3, 6], 'Bar': ['hello', 'world', 'spam', 'eggs']})
>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> import math
>>> no_buckets = 4
>>> bucket_size = df.Foo_cumsum.max() / no_buckets
>>> df['bucket'] = (df.Foo_cumsum / bucket_size).apply(math.ceil)
>>> df
Bar Foo Foo_cumsum bucket
0 hello 1 1 1
1 world 2 3 1
2 spam 3 6 2
在变量
no_bucket
中相应更改所需的bucket数量,通过@congusbongus改进解决方案
>>> import pandas as pd
>>> df = pd.DataFrame({'Foo': [1, 2, 3, 6], 'Bar': ['hello', 'world', 'spam', 'eggs']})
>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> import math
>>> no_buckets = 4
>>> bucket_size = df.Foo_cumsum.max() / no_buckets
>>> df['bucket'] = (df.Foo_cumsum / bucket_size).apply(math.ceil)
>>> df
Bar Foo Foo_cumsum bucket
0 hello 1 1 1
1 world 2 3 1
2 spam 3 6 2
在变量
no_bucket
中相应更改所需的bucket数量,通过@congusbongus改进解决方案
>>> import pandas as pd
>>> df = pd.DataFrame({'Foo': [1, 2, 3, 6], 'Bar': ['hello', 'world', 'spam', 'eggs']})
>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> import math
>>> no_buckets = 4
>>> bucket_size = df.Foo_cumsum.max() / no_buckets
>>> df['bucket'] = (df.Foo_cumsum / bucket_size).apply(math.ceil)
>>> df
Bar Foo Foo_cumsum bucket
0 hello 1 1 1
1 world 2 3 1
2 spam 3 6 2
在变量
no_bucket
中相应更改所需的bucket数量,通过@congusbongus改进解决方案
>>> import pandas as pd
>>> df = pd.DataFrame({'Foo': [1, 2, 3, 6], 'Bar': ['hello', 'world', 'spam', 'eggs']})
>>> df['Foo_cumsum'] = df.Foo.cumsum()
>>> import math
>>> no_buckets = 4
>>> bucket_size = df.Foo_cumsum.max() / no_buckets
>>> df['bucket'] = (df.Foo_cumsum / bucket_size).apply(math.ceil)
>>> df
Bar Foo Foo_cumsum bucket
0 hello 1 1 1
1 world 2 3 1
2 spam 3 6 2
在变量no_bucket