Python 搜索列并返回最低等级

Python 搜索列并返回最低等级,python,pandas,Python,Pandas,我有一个大数据集,我想搜索最低和最高学校等级,其中包含PK,K,1,2,3,4,5,6,7,8,9,10,11,12。 我想将lowest和highest作为它自己的列添加到dataframe中 输入数据: Name Grades_Offered_All Student_Count_Total A PK,K,1,2,3,4,5 415 B

我有一个大数据集,我想搜索
最低
最高
学校等级,其中包含
PK,K,1,2,3,4,5,6,7,8,9,10,11,12
。 我想将lowest和highest作为它自己的列添加到dataframe中

输入数据:

    Name               Grades_Offered_All       Student_Count_Total
    A                      PK,K,1,2,3,4,5           415
    B                      PK,K,1,2,3,4,5,6,7,8     241
    C                      PK,K,1,2,3,4,5,6,7,8     346
    D                      K,1,2                    91
    E                      PK,K,1,2,3               248
    Lowest
    A = PK
    B = PK
    C = PK
    D = K
    E = PK

    Highest
    A = 5
    B = 8
    C = 8
    D = 2
    E = 3
预期输出:

    Name               Grades_Offered_All       Student_Count_Total
    A                      PK,K,1,2,3,4,5           415
    B                      PK,K,1,2,3,4,5,6,7,8     241
    C                      PK,K,1,2,3,4,5,6,7,8     346
    D                      K,1,2                    91
    E                      PK,K,1,2,3               248
    Lowest
    A = PK
    B = PK
    C = PK
    D = K
    E = PK

    Highest
    A = 5
    B = 8
    C = 8
    D = 2
    E = 3

对命名组使用
str.extract

df.Grades_Offered_All.str.extract(r'(?P<Lowest>[^,]+).+(?P<Highest>[^,]+)')

Out[480]:
  Lowest Highest
0     PK       5
1     PK       8
2     PK       8
3      K       2
4     PK       3

对命名组使用
str.extract

df.Grades_Offered_All.str.extract(r'(?P<Lowest>[^,]+).+(?P<Highest>[^,]+)')

Out[480]:
  Lowest Highest
0     PK       5
1     PK       8
2     PK       8
3      K       2
4     PK       3
使用lambda:

df['Lowest'] = df.apply(lambda x: x.Name+" = "+x.Grades_Offered_All.split(",")[0], axis=1)

df['Highest'] = df.apply(lambda x: x.Name+" = "+x.Grades_Offered_All.split(",")[-1], axis=1)
结果:

        Name    Grades_Offered_All       Student_Count_Total    Highest Lowest
    0   A       PK,K,1,2,3,4,5           415                    A = 5   A = PK
    1   B       PK,K,1,2,3,4,5,6,7,8     241                    B = 8   B = PK
    2   C       PK,K,1,2,3,4,5,6,7,8     346                    C = 8   C = PK
    3   D       K,1,2                    91                     D = 2   D = K
    4   E       PK,K,1,2,3               248                    E = 3   E = PK
如果您只想要数据帧中的最高值和最低值,只需使用lambda:

df['Lowest'] = df.apply(lambda x: x.Name+" = "+x.Grades_Offered_All.split(",")[0], axis=1)

df['Highest'] = df.apply(lambda x: x.Name+" = "+x.Grades_Offered_All.split(",")[-1], axis=1)
结果:

        Name    Grades_Offered_All       Student_Count_Total    Highest Lowest
    0   A       PK,K,1,2,3,4,5           415                    A = 5   A = PK
    1   B       PK,K,1,2,3,4,5,6,7,8     241                    B = 8   B = PK
    2   C       PK,K,1,2,3,4,5,6,7,8     346                    C = 8   C = PK
    3   D       K,1,2                    91                     D = 2   D = K
    4   E       PK,K,1,2,3               248                    E = 3   E = PK

如果您只想要数据帧中的最高和最低值:
df=df[['Highest','lower']]

如果数据如图所示,那么您只需在第一个逗号之前查找字母数字序列,在最后一个逗号之后查找字母数字序列。对于
.25.0
版本:
df.assign(k=df.Grades\u provided\u All.str.split(',')).explode('k').groupby('Name')['k'].agg(['first','last'])
如果数据如图所示,则只需在第一个逗号之前查找字母数字序列,在最后一个逗号之后查找字母数字序列。对于pandas
.25.0
版本:
df.assign(k=df.Grades\u provided\u All.str.split(',')).explode('k').groupby('Name')['k'].agg(['first','last'])