Python 搜索列并返回最低等级
我有一个大数据集,我想搜索Python 搜索列并返回最低等级,python,pandas,Python,Pandas,我有一个大数据集,我想搜索最低和最高学校等级,其中包含PK,K,1,2,3,4,5,6,7,8,9,10,11,12。 我想将lowest和highest作为它自己的列添加到dataframe中 输入数据: Name Grades_Offered_All Student_Count_Total A PK,K,1,2,3,4,5 415 B
最低
和最高
学校等级,其中包含PK,K,1,2,3,4,5,6,7,8,9,10,11,12
。
我想将lowest和highest作为它自己的列添加到dataframe中
输入数据:
Name Grades_Offered_All Student_Count_Total
A PK,K,1,2,3,4,5 415
B PK,K,1,2,3,4,5,6,7,8 241
C PK,K,1,2,3,4,5,6,7,8 346
D K,1,2 91
E PK,K,1,2,3 248
Lowest
A = PK
B = PK
C = PK
D = K
E = PK
Highest
A = 5
B = 8
C = 8
D = 2
E = 3
预期输出:
Name Grades_Offered_All Student_Count_Total
A PK,K,1,2,3,4,5 415
B PK,K,1,2,3,4,5,6,7,8 241
C PK,K,1,2,3,4,5,6,7,8 346
D K,1,2 91
E PK,K,1,2,3 248
Lowest
A = PK
B = PK
C = PK
D = K
E = PK
Highest
A = 5
B = 8
C = 8
D = 2
E = 3
对命名组使用
str.extract
df.Grades_Offered_All.str.extract(r'(?P<Lowest>[^,]+).+(?P<Highest>[^,]+)')
Out[480]:
Lowest Highest
0 PK 5
1 PK 8
2 PK 8
3 K 2
4 PK 3
对命名组使用
str.extract
df.Grades_Offered_All.str.extract(r'(?P<Lowest>[^,]+).+(?P<Highest>[^,]+)')
Out[480]:
Lowest Highest
0 PK 5
1 PK 8
2 PK 8
3 K 2
4 PK 3
使用lambda:
df['Lowest'] = df.apply(lambda x: x.Name+" = "+x.Grades_Offered_All.split(",")[0], axis=1)
df['Highest'] = df.apply(lambda x: x.Name+" = "+x.Grades_Offered_All.split(",")[-1], axis=1)
结果:
Name Grades_Offered_All Student_Count_Total Highest Lowest
0 A PK,K,1,2,3,4,5 415 A = 5 A = PK
1 B PK,K,1,2,3,4,5,6,7,8 241 B = 8 B = PK
2 C PK,K,1,2,3,4,5,6,7,8 346 C = 8 C = PK
3 D K,1,2 91 D = 2 D = K
4 E PK,K,1,2,3 248 E = 3 E = PK
如果您只想要数据帧中的最高值和最低值,只需使用lambda:
df['Lowest'] = df.apply(lambda x: x.Name+" = "+x.Grades_Offered_All.split(",")[0], axis=1)
df['Highest'] = df.apply(lambda x: x.Name+" = "+x.Grades_Offered_All.split(",")[-1], axis=1)
结果:
Name Grades_Offered_All Student_Count_Total Highest Lowest
0 A PK,K,1,2,3,4,5 415 A = 5 A = PK
1 B PK,K,1,2,3,4,5,6,7,8 241 B = 8 B = PK
2 C PK,K,1,2,3,4,5,6,7,8 346 C = 8 C = PK
3 D K,1,2 91 D = 2 D = K
4 E PK,K,1,2,3 248 E = 3 E = PK
如果您只想要数据帧中的最高和最低值:
df=df[['Highest','lower']]
如果数据如图所示,那么您只需在第一个逗号之前查找字母数字序列,在最后一个逗号之后查找字母数字序列。对于.25.0
版本:df.assign(k=df.Grades\u provided\u All.str.split(',')).explode('k').groupby('Name')['k'].agg(['first','last'])
如果数据如图所示,则只需在第一个逗号之前查找字母数字序列,在最后一个逗号之后查找字母数字序列。对于pandas.25.0
版本:df.assign(k=df.Grades\u provided\u All.str.split(',')).explode('k').groupby('Name')['k'].agg(['first','last'])