Python 使用筛选器执行分组的有效方法
我需要分组的数据帧和应用一些过滤器,我不知道如何做到这一点 假设有3列:Python 使用筛选器执行分组的有效方法,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我需要分组的数据帧和应用一些过滤器,我不知道如何做到这一点 假设有3列:group、distance、value,group是groupby的列,distance是我要应用过滤器的列,value是如果过滤器返回true,我要采用的列 看看我做了什么: from numpy import around from numpy.random import uniform from pandas import DataFrame data = around(a=uniform(low=1.0, hig
group、distance、value
,group
是groupby的列,distance
是我要应用过滤器的列,value
是如果过滤器返回true,我要采用的列
看看我做了什么:
from numpy import around
from numpy.random import uniform
from pandas import DataFrame
data = around(a=uniform(low=1.0, high=50.0, size=(20, 3)), decimals=3)
df = DataFrame(data=data, columns=['group', 'distance', 'value'], dtype='float64')
rows, columns = df.shape
df.loc[:rows // 2, 'group'] = 1.0
df.loc[rows // 2:, 'group'] = 2.0
print(df)
df.loc[:, 'next_distance'] = df.groupby(by='group')['distance'].shift(periods=-1)
df.loc[:, 'next_value'] = df.groupby(by='group')['value'].shift(periods=-1)
distance_filter = df.loc[:, 'next_distance'] - df.loc[:, 'distance'] > 10.0
df.loc[distance_filter, 'new_value'] = df.loc[distance_filter, 'next_value']
print(df)
df
的第一次打印是:
group distance value
0 1.0 3.757 30.593
1 1.0 14.770 13.313
2 1.0 12.594 38.865
3 1.0 47.806 36.357
4 1.0 7.930 28.235
5 1.0 6.133 42.323
6 1.0 23.422 4.883
7 1.0 12.706 1.606
8 1.0 29.787 48.096
9 1.0 41.889 24.148
10 2.0 15.712 28.568
11 2.0 38.143 20.496
12 2.0 24.282 9.562
13 2.0 25.148 26.535
14 2.0 44.163 42.303
15 2.0 38.116 17.947
16 2.0 4.716 17.259
17 2.0 11.980 4.369
18 2.0 35.533 20.866
19 2.0 11.921 47.971
group distance value next_distance next_value new_value
0 1.0 3.757 30.593 14.770 13.313 30.593
1 1.0 14.770 13.313 12.594 38.865 NaN
2 1.0 12.594 38.865 47.806 36.357 38.865
3 1.0 47.806 36.357 7.930 28.235 NaN
4 1.0 7.930 28.235 6.133 42.323 NaN
5 1.0 6.133 42.323 23.422 4.883 42.323
6 1.0 23.422 4.883 12.706 1.606 NaN
7 1.0 12.706 1.606 29.787 48.096 1.606
8 1.0 29.787 48.096 41.889 24.148 48.096
9 1.0 41.889 24.148 NaN NaN NaN
10 2.0 15.712 28.568 38.143 20.496 28.568
11 2.0 38.143 20.496 24.282 9.562 NaN
12 2.0 24.282 9.562 25.148 26.535 NaN
13 2.0 25.148 26.535 44.163 42.303 26.535
14 2.0 44.163 42.303 38.116 17.947 NaN
15 2.0 38.116 17.947 4.716 17.259 NaN
16 2.0 4.716 17.259 11.980 4.369 NaN
17 2.0 11.980 4.369 35.533 20.866 4.369
18 2.0 35.533 20.866 11.921 47.971 NaN
19 2.0 11.921 47.971 NaN NaN NaN
df
的第二次打印是:
group distance value
0 1.0 3.757 30.593
1 1.0 14.770 13.313
2 1.0 12.594 38.865
3 1.0 47.806 36.357
4 1.0 7.930 28.235
5 1.0 6.133 42.323
6 1.0 23.422 4.883
7 1.0 12.706 1.606
8 1.0 29.787 48.096
9 1.0 41.889 24.148
10 2.0 15.712 28.568
11 2.0 38.143 20.496
12 2.0 24.282 9.562
13 2.0 25.148 26.535
14 2.0 44.163 42.303
15 2.0 38.116 17.947
16 2.0 4.716 17.259
17 2.0 11.980 4.369
18 2.0 35.533 20.866
19 2.0 11.921 47.971
group distance value next_distance next_value new_value
0 1.0 3.757 30.593 14.770 13.313 30.593
1 1.0 14.770 13.313 12.594 38.865 NaN
2 1.0 12.594 38.865 47.806 36.357 38.865
3 1.0 47.806 36.357 7.930 28.235 NaN
4 1.0 7.930 28.235 6.133 42.323 NaN
5 1.0 6.133 42.323 23.422 4.883 42.323
6 1.0 23.422 4.883 12.706 1.606 NaN
7 1.0 12.706 1.606 29.787 48.096 1.606
8 1.0 29.787 48.096 41.889 24.148 48.096
9 1.0 41.889 24.148 NaN NaN NaN
10 2.0 15.712 28.568 38.143 20.496 28.568
11 2.0 38.143 20.496 24.282 9.562 NaN
12 2.0 24.282 9.562 25.148 26.535 NaN
13 2.0 25.148 26.535 44.163 42.303 26.535
14 2.0 44.163 42.303 38.116 17.947 NaN
15 2.0 38.116 17.947 4.716 17.259 NaN
16 2.0 4.716 17.259 11.980 4.369 NaN
17 2.0 11.980 4.369 35.533 20.866 4.369
18 2.0 35.533 20.866 11.921 47.971 NaN
19 2.0 11.921 47.971 NaN NaN NaN
我所需要的就是
新的\u值列,有没有更好的方法?您可以对这两列使用groouby
,然后减去df1['distance']-df['distance']
:
df1 = df.groupby(by='group')[['distance','value']].shift(periods=-1)
distance_filter = df1['distance'] - df['distance'] > 10.0
df.loc[distance_filter, 'new_value'] = df1.loc[distance_filter, 'value']
print(df)
group distance value new_value
0 1.0 26.097 16.973 16.973
1 1.0 36.866 28.804 NaN
2 1.0 28.644 17.779 NaN
3 1.0 19.339 44.409 NaN
4 1.0 5.768 28.003 28.003
5 1.0 40.646 3.632 NaN
6 1.0 20.141 8.516 NaN
7 1.0 17.949 46.639 NaN
8 1.0 23.825 45.374 NaN
9 1.0 11.013 33.044 NaN
10 2.0 42.859 39.162 NaN
11 2.0 45.025 17.099 NaN
12 2.0 7.124 19.366 19.366
13 2.0 22.728 23.045 23.045
14 2.0 34.603 46.527 46.527
15 2.0 45.901 40.602 NaN
16 2.0 20.585 11.294 NaN
17 2.0 27.979 24.360 NaN
18 2.0 15.374 5.726 5.726
19 2.0 27.611 17.011 NaN
如果需要相同的输出,只需更改一位:
df=df.join(df.groupby('group')[['distance','value']].shift(periods=-1).add_prefix('next_'))
distance_filter = df['next_distance'] - df['distance'] > 10.0
df.loc[distance_filter, 'new_value'] = df.loc[distance_filter, 'next_value']
print(df)
group distance value next_distance next_value new_value
0 1.0 12.253 29.438 28.814 38.660 29.438
1 1.0 28.814 38.660 20.756 24.588 NaN
2 1.0 20.756 24.588 16.776 11.183 NaN
3 1.0 16.776 11.183 7.214 47.655 NaN
4 1.0 7.214 47.655 17.083 17.805 NaN
5 1.0 17.083 17.805 24.074 4.120 NaN
6 1.0 24.074 4.120 40.108 48.605 4.120
7 1.0 40.108 48.605 40.571 1.591 NaN
8 1.0 40.571 1.591 30.987 36.448 NaN
9 1.0 30.987 36.448 NaN NaN NaN
10 2.0 37.585 13.128 9.864 18.969 NaN
11 2.0 9.864 18.969 46.241 39.490 18.969
12 2.0 46.241 39.490 40.612 7.873 NaN
13 2.0 40.612 7.873 39.053 16.816 NaN
14 2.0 39.053 16.816 13.665 32.730 NaN
15 2.0 13.665 32.730 35.349 43.783 32.730
16 2.0 35.349 43.783 11.412 19.120 NaN
17 2.0 11.412 19.120 40.855 41.502 19.120
18 2.0 40.855 41.502 16.973 40.430 NaN
19 2.0 16.973 40.430 NaN NaN NaN
编辑:
有一种方法可以保持df1
?@Hazan-onlydf1=df['group']]中的group
列。加入(df.groupby(by='group')[['distance','value']]].shift(periods=-1))
在我做这项工作时,班次被取消了。好的,非常感谢,现在它正在工作。。。但我需要的是移位的组,而不是原来的组group@Hazan-不确定是否理解,需要df1=df.assign(g=df['group']).groupby(by='g')[['group'、'distance'、'value']].shift(句点=-1)
?