Python 将“熊猫”列中的名称更改为以大写字母开头
背景 我有一个玩具Python 将“熊猫”列中的名称更改为以大写字母开头,python,string,pandas,text,apply,Python,String,Pandas,Text,Apply,背景 我有一个玩具df import pandas as pd df = pd.DataFrame({'Text' : ['Jon J Mmith is Here', 'Mary Lisa Hder found here', 'Jane A Doe is also here', 'Tom T
df
import pandas as pd
df = pd.DataFrame({'Text' : ['Jon J Mmith is Here',
'Mary Lisa Hder found here',
'Jane A Doe is also here',
'Tom T Tcker is here too'],
'P_ID': [1,2,3,4],
'P_Name' : ['MMITH, JON J', 'HDER, MARY LISA', 'DOE, JANE A', 'TCKER, TOM T'],
'N_ID' : ['A1', 'A2', 'A3', 'A4']
})
#rearrange columns
df = df[['Text','N_ID', 'P_ID', 'P_Name']]
df
Text N_ID P_ID P_Name
0 Jon J Mmith is Here A1 1 MMITH, JON J
1 Mary Lisa Hder found here A2 2 HDER, MARY LISA
2 Jane A Doe is also here A3 3 DOE, JANE A
3 Tom T Tcker is here to A4 4 TCKER, TOM T
目标
1) 将p_Name
列从df
更改为与所需输出类似的格式;也就是说,将当前格式(例如MMITH,JON J
)更改为一种格式(例如MMITH,JON J
),其中姓名和中间字母都以大写字母开头
2) 在新列p\u Name\u new
所需输出
Text N_ID P_ID P_Name P_Name_New
0 Jon J Mmith is Here A1 1 MMITH, JON J Mmith, Jon J
1 Mary Lisa Hder found here A2 2 HDER, MARY LISA Hder, Mary Lisa
2 Jane A Doe is also here A3 3 DOE, JANE A Doe, Jane A
3 Tom T Tcker is here too A4 4 TCKER, TOM T Tcker, Tom T
问题
我如何实现我想要的目标 只需使用
str.title()
函数:
In [98]: df['P_Name_New'] = df['P_Name'].str.title()
In [99]: df
Out[99]:
Text N_ID P_ID P_Name P_Name_New
0 Jon J Smith is Here A1 1 SMITH, JON J Smith, Jon J
1 Mary Lisa Rider found here A2 2 RIDER, MARY LISA Rider, Mary Lisa
2 Jane A Doe is also here A3 3 DOE, JANE A Doe, Jane A
3 Tom T Tucker is here too A4 4 TUCKER, TOM T Tucker, Tom T
只需使用
str.title()
函数:
In [98]: df['P_Name_New'] = df['P_Name'].str.title()
In [99]: df
Out[99]:
Text N_ID P_ID P_Name P_Name_New
0 Jon J Smith is Here A1 1 SMITH, JON J Smith, Jon J
1 Mary Lisa Rider found here A2 2 RIDER, MARY LISA Rider, Mary Lisa
2 Jane A Doe is also here A3 3 DOE, JANE A Doe, Jane A
3 Tom T Tucker is here too A4 4 TUCKER, TOM T Tucker, Tom T
执行类似于
应用lambda x:x.title
的操作是否有任何性能或其他差异?谢谢@patrick,%timeit df['P_Name'].str.title()每个循环110µs±1.99µs(平均±标准偏差为7次运行,每个循环10000次)
-%timeit df['P_Name'].应用(λx:x.title())每个循环146µs±483 ns(7次运行的平均值±标准偏差,每个循环10000次)
执行类似于应用lambda x:x.title的操作是否有任何性能或其他差异?谢谢@patrick,%timeit df['P_Name'].str.title()每个循环110µs±1.99µs(平均±标准偏差为7次运行,每个循环10000次)
-%timeit df['P_Name'].应用(λx:x.title())每个回路146µs±483 ns(7次运行的平均值±标准偏差,每个10000个回路)