Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/301.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 带Lambda的Pandas-Groupby及其算法_Python_Pandas_Lambda_Group By - Fatal编程技术网

Python 带Lambda的Pandas-Groupby及其算法

Python 带Lambda的Pandas-Groupby及其算法,python,pandas,lambda,group-by,Python,Pandas,Lambda,Group By,给定此数据帧: import pandas as pd import jenkspy f = pd.DataFrame({'BreakGroup':['A','A','A','A','A','A','B','B','B','B','B'], 'Final':[1,2,3,4,5,6,10,20,30,40,50]}) BreakGroup Final 0 A 1 1 A 2 2 A

给定此数据帧:

import pandas as pd
import jenkspy
f = pd.DataFrame({'BreakGroup':['A','A','A','A','A','A','B','B','B','B','B'],
                 'Final':[1,2,3,4,5,6,10,20,30,40,50]})
    BreakGroup  Final
0         A     1
1         A     2
2         A     3
3         A     4
4         A     5
5         A     6
6         B     10
7         B     20
8         B     30
9         B     40
10        B     50
我想使用jenkspy来识别组,基于4个组(类)的自然中断,组“BreakGroup”中“Final”中的每个值都属于该组

我一开始是这样做的:

jenks=lambda x: jenkspy.jenks_breaks(f['Final'].tolist(),nb_class=4)
f['Group']=f.groupby(['BreakGroup'])['BreakGroup'].transform(jenks)
…这导致:

BreakGroup
A    [1.0, 10.0, 20.0, 30.0, 50.0]
B    [1.0, 10.0, 20.0, 30.0, 50.0]
Name: BreakGroup, dtype: object
这里的第一个问题,正如您所猜测的,是它将lambda函数应用于“最终”分数的整列,而不仅仅是属于Groupby中每个组的分数。第二个问题是,我需要一个列来指定正确的组(类)成员身份,可能是通过使用transform而不是apply

然后我试了一下:

jenks=lambda x: jenkspy.jenks_breaks(f['Final'].loc[f['BreakGroup']==x].tolist(),nb_class=4)
f['Group']=f.groupby(['BreakGroup'])['BreakGroup'].transform(jenks)
…但很快就被击退屈服:

ValueError: Can only compare identically-labeled Series objects
更新:

f.sort_values('BreakGroup',inplace=True)
f.reset_index(drop=True,inplace=True)
jenks = lambda x: jenkspy.jenks_breaks(x['Final'].tolist(),nb_class=4)
g = f.set_index('BreakGroup')
g['Groups'] = f.groupby(['BreakGroup']).apply(jenks)
g.reset_index(inplace=True)
groups= lambda x: [gp for gp in x['Groups']]
#'final' value should be > lower and <= upper
upper = lambda x: [gp for gp in x['Groups'] if gp >= x['Final']][0] # or gp == max(x['Groups'])
lower= lambda x: [gp for gp in x['Groups'] if gp < x['Final'] or gp == min(x['Groups'])][-1]
GroupIndex= lambda x: [x['Groups'].index(gp) for gp in x['Groups'] if gp < x['Final'] or gp == min(x['Groups'])][-1]
f['Groups']=g.apply(groups, axis=1)
f['Upper'] = g.apply(upper, axis=1)
f['Lower'] = g.apply(lower, axis=1)
f['Group'] = g.apply(GroupIndex, axis=1)
f['Group']=f['Group']+1
这是期望的结果。“结果”列包含每组“BreakGroup”的“Final”中相应值的组上限:

提前谢谢

我根据接受的解决方案略微修改了应用程序:

f.sort_values('BreakGroup',inplace=True)
f.reset_index(drop=True,inplace=True)
jenks = lambda x: jenkspy.jenks_breaks(x['Final'].tolist(),nb_class=4)
g = f.set_index('BreakGroup')
g['Groups'] = f.groupby(['BreakGroup']).apply(jenks)
g.reset_index(inplace=True)
groups= lambda x: [gp for gp in x['Groups']]
#'final' value should be > lower and <= upper
upper = lambda x: [gp for gp in x['Groups'] if gp >= x['Final']][0] # or gp == max(x['Groups'])
lower= lambda x: [gp for gp in x['Groups'] if gp < x['Final'] or gp == min(x['Groups'])][-1]
GroupIndex= lambda x: [x['Groups'].index(gp) for gp in x['Groups'] if gp < x['Final'] or gp == min(x['Groups'])][-1]
f['Groups']=g.apply(groups, axis=1)
f['Upper'] = g.apply(upper, axis=1)
f['Lower'] = g.apply(lower, axis=1)
f['Group'] = g.apply(GroupIndex, axis=1)
f['Group']=f['Group']+1
f.sort\u值('BreakGroup',inplace=True)
f、 重置索引(drop=True,inplace=True)
jenks=lambda x:jenkspy.jenks_breaks(x['Final'].tolist(),nb_class=4)
g=f.set_索引('BreakGroup')
g['Groups']=f.groupby(['BreakGroup'])。应用(jenks)
g、 重置索引(就地=真)
组=λx:[x['groups']中gp的gp]
#“final”值应大于等于x['final'][0]#或gp==max(x['Groups'])
下限=λx:[如果gp
这将返回:

  • 组边界列表

  • 与“最终”值相关的上边界

  • 与“最终”值相关的下边界

  • 根据注释中注明的逻辑,“最终”值所属的组


  • 当前,您正在将一个序列传递到
    transform()
    中,而不是像您希望的那样传递给过滤器条件的标量。考虑第一个值的索引,如<代码> x.Curry[0 ] < /代码>,因为所有代码在<代码> GROPPB/<代码>系列中都是相同的。您甚至可以运行
    min(x)
    max(x)


    您将
    jenks
    定义为lambda变量
    x
    中的常数,因此它不取决于使用
    apply
    transform
    为其提供的内容。将
    jenks
    的定义更改为

    jenks = lambda x: jenkspy.jenks_breaks(x['Final'].tolist(),nb_class=4)
    
    给予

    从这一重新定义继续

    g = f.set_index('BreakGroup')
    g['Groups'] = f.groupby(['BreakGroup']).apply(jenks)
    g.reset_index(inplace=True)
    group = lambda x: [gp for gp in x['Groups'] if gp > x['Final'] or gp == max(x['Groups'])][0]
    f['Result'] = g.apply(group, axis=1)
    
    给予


    你能发布目标输出吗?当然;看更新。太好了!顺便说一句,我用它来获得下限值:group2=lambda x:[gp for gp in x['Groups']如果gp@DanceParty2,因为您使用的是
    如何查找值的索引而不是值本身(即,顶部行为0)?您的意思是,jenks产生的组列表中的哪个索引?是的,给定g行列表中元素的索引。
    
    In [315]: f.groupby(['BreakGroup']).apply(jenks)
    Out[315]: 
    BreakGroup
    A         [1.0, 2.0, 3.0, 4.0, 6.0]
    B    [10.0, 20.0, 30.0, 40.0, 50.0]
    dtype: object
    
    g = f.set_index('BreakGroup')
    g['Groups'] = f.groupby(['BreakGroup']).apply(jenks)
    g.reset_index(inplace=True)
    group = lambda x: [gp for gp in x['Groups'] if gp > x['Final'] or gp == max(x['Groups'])][0]
    f['Result'] = g.apply(group, axis=1)
    
    In [323]: f
    Out[323]: 
       BreakGroup  Final  Result
    0           A      1     2.0
    1           A      2     3.0
    2           A      3     4.0
    3           A      4     6.0
    4           A      5     6.0
    5           A      6     6.0
    6           B     10    20.0
    7           B     20    30.0
    8           B     30    40.0
    9           B     40    50.0
    10          B     50    50.0