Python 大熊猫按保护顺序分组_Python_Pandas_Pandas Groupby

Python 大熊猫按保护顺序分组

python pandas

Python 大熊猫按保护顺序分组,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有一些数据如下所示：赛季团队ID开始和结束 0 1984-85 CHI 1610612741 1984 1985 1 1985-86 CHI 1610612741 1985 1986 2 1986-87 CHI 1610612741 1986 1987 3 1987-88 CHI 1610612741 1987-1988 4 1988-89 CHI 1610612741 1988 1989 5 1989-90 CHI 1610612741 1989 1990 6 1990-91 CHI 16

我有一些数据如下所示：

赛季团队ID开始和结束
0 1984-85 CHI 1610612741 1984 1985
1 1985-86 CHI 1610612741 1985 1986
2 1986-87 CHI 1610612741 1986 1987
3 1987-88 CHI 1610612741 1987-1988
4 1988-89 CHI 1610612741 1988 1989
5 1989-90 CHI 1610612741 1989 1990
6 1990-91 CHI 1610612741 1990 1991
7 1991-92 CHI 1610612741 1991 1992
8 1992-93 CHI 1610612741 1992 1993
9 1994-95 CHI 1610612741 1994 1995
10 1995-96 CHI 1610612741 1995 1996
11 1996-97 CHI 1610612741 1996 1997
12 1997-98 CHI 1610612741 1997 1998
13 2001-02年度为1610612764 2001 2002
14 2002-03年度为1610612764 2002-2003

我正在寻找一种方法，将团队和团队id列组合在一起，并获得最小起始值和最大结束列。对于上述数据，它将是

团队ID年数
迟1610612741 1984-93
迟1610612741 1994-98
2001-03年度为1610612764

对于一年内拥有多个团队的人

赛季团队ID开始和结束
0 2003-04 MIA 1610612748 2003 2004
1 2004-05 MIA 1610612748 2004 2005
2 2005-06 MIA 1610612748 2005 2006
3 2006-07 MIA 1610612748 2006 2007
4 2007-08 MIA 1610612748 2007 2008
5 2008-09 MIA 1610612748 2008-2009
6 2009-10 MIA 1610612748 2009-2010
7 2010-11 MIA 1610612748 2010-2011
8 2011-12 MIA 1610612748 2011-2012
9 2012-13 MIA 1610612748 2012 2013
10 2013-14 MIA 1610612748 2013-2014
11 2014-15 MIA 1610612748 2014-2015
12 2015-16 MIA 1610612748 2015-2016
13 2016-17 CHI 1610612741 2016 2017
14 2017-18 CLE 1610612739 2017-2018
15 2017-18 MIA 1610612748 2017-2018
17 2018-19 MIA 1610612748 2018 2019

我希望它看起来像这样：

团队ID年数
MIA 1610612748 2003-16
迟1610612741 2016-17
CLE 1610612739 2017-17
MIA 1610612748 2017-19

有人知道怎么做吗？我试过使用熊猫。分组依据，但它会将相同的团队分组为一个团队，我想将它们分开

一种方法是使用嵌套的分组依据来识别团队中的连续赛季：

def func(df):
    # indicator of consecutive seasons
    g = (df['start'] > df['end'].shift(1)).cumsum()
    res = df.groupby(g).apply(
        lambda x: str(x['start'].min()) + '-' + str(x['end'].max())[-2:],
    )
    res.name = 'Years'
    return res 


df.groupby(['Team', 'TEAM_ID']).apply(func).reset_index()[['Team', 'TEAM_ID', 'Years']]

输出：

  Team     TEAM_ID    Years
0  CHI  1610612741  2016-17
1  CLE  1610612739  2017-18
2  MIA  1610612748  2003-16
3  MIA  1610612748  2017-19

这个问题的另一个解决方案是：它使用Pandas方法的组合来查找行和groupby函数之间的差异

 def grouping(df):

    #condition checks if row - previous row is not equal to 1 (end column)
    #or row not equal to previous row for the Team column
    cond = df.end.sub(df.end.shift()).ne(1) | (df.Team.ne(df.Team.shift()))

    #get rows where the end year does not change
    no_year_end_change = df.end.shift(-1).sub(df.end).eq(0)

    #create a new column to get values from the start column based on the condition
    df['change'] = df.loc[cond,'start']

    #create a new column to get values from the end column based on the condition
    df['end_edit'] = np.where(no_year_end_change,df.start,df.end)

    #integer conversion... gets rids of the float 0s
    df['change'] = df.change.ffill().astype('Int64')

    #groupby, get the max of the end column
    df = df.groupby(['Team','TEAM_ID','change']).end_edit.max().reset_index()

    #combine change and end columns using Pandas' str cat function
    df['Years'] = df.change.astype(str).str.cat(df.end_edit.astype(str),sep='-')
    df = df.drop(['change','end_edit'],axis = 1)

    return df

第一个数据帧：

 df.pipe(grouping)
     Team   TEAM_ID      Years
0   CHI     1610612741  1984-1993
1   CHI     1610612741  1994-1998
2   WAS     1610612764  2001-2003

df1.pipe(grouping)

   Team      TEAM_ID      Years
0   CHI     1610612741  2016-2017
1   CLE     1610612739  2017-2017
2   MIA     1610612748  2003-2016
3   MIA     1610612748  2017-2019

第二数据帧：

 df.pipe(grouping)
     Team   TEAM_ID      Years
0   CHI     1610612741  1984-1993
1   CHI     1610612741  1994-1998
2   WAS     1610612764  2001-2003

df1.pipe(grouping)

   Team      TEAM_ID      Years
0   CHI     1610612741  2016-2017
1   CLE     1610612739  2017-2017
2   MIA     1610612748  2003-2016
3   MIA     1610612748  2017-2019

请展示您的尝试。所以您基本上希望按季节分组，但以年为单位输出季节？@wwnde是的，但如果我使用groupby，它将所有团队分组在一起，在我的第二个示例中，我希望将每个MIA stint分开。请发布您迄今为止尝试过的内容，我们可以看到问题所在，并在第二个数据帧中对其进行改进，y不是2017-2018年的cle这太棒了！谢谢你的帮助这太棒了！非常感谢。