Python 删除列中字符串的一部分
我的数据框中有一列,如下所示: 输入Python 删除列中字符串的一部分,python,pandas,Python,Pandas,我的数据框中有一列,如下所示: 输入 df['location.display_name'] 输出 Kelso, Scottish Borders Manchester, Greater Manchester Northampton, Northamptonshire Reading, Berkshire Leicester, Leicestershire Newport, Wales Swindon, Wiltshire Perth, Perth & Kinross
df['location.display_name']
输出
Kelso, Scottish Borders
Manchester, Greater Manchester
Northampton, Northamptonshire
Reading, Berkshire
Leicester, Leicestershire
Newport, Wales
Swindon, Wiltshire
Perth, Perth & Kinross
Manchester, Greater Manchester
Perth, Perth & Kinross
Cardiff
Hull, East Riding Of Yorkshire
Chester, Cheshire
Southampton
Leamington Spa, Warwickshire
Swindon, Wiltshire
Slough, Berkshire
Portsmouth, Hampshire
我想创建一个只包含位置第一部分的新列-例如:Swindon,Wiltshire我想保留Swindon并将其添加到新列中
还有,这将如何影响一些我想保留的单词,比如Cardiff
我认为需要通过str[0]
选择第一列列表
s或通过[0]
选择第一列:
df['new'] = df['location.display_name'].str.split(',').str[0]
#alternative
#df['new'] = df['location.display_name'].str.split(',', expand=True)[0]
print (df)
location.display_name new
0 Kelso, Scottish Borders Kelso
1 Manchester, Greater Manchester Manchester
2 Northampton, Northamptonshire Northampton
3 Reading, Berkshire Reading
4 Leicester, Leicestershire Leicester
5 Newport, Wales Newport
6 Swindon, Wiltshire Swindon
7 Perth, Perth & Kinross Perth
8 Manchester, Greater Manchester Manchester
9 Perth, Perth & Kinross Perth
10 Cardiff Cardiff
11 Hull, East Riding Of Yorkshire Hull
12 Chester, Cheshire Chester
13 Southampton Southampton
14 Leamington Spa, Warwickshire Leamington Spa
15 Swindon, Wiltshire Swindon
16 Slough, Berkshire Slough
17 Portsmouth, Hampshire Portsmouth
如果数据中没有NaN
s和None
s,则可以使用列表理解
:
df['new'] = [x.split(',')[0] for x in df['location.display_name']]
要在列的每个元素上执行自定义功能,可以使用pandas
apply
函数。在您的情况下,应使用以下代码执行此操作:
import pandas
import numpy
def get_first_substring(x):
if (x!=None and x!=numpy.nan):
return x.split(',')[0]
dataframe['new'] = dataframe['location.display_name'].apply(get_first_substring)
输出将如下所示:
old new
subsstring1, subsstring2 subsstring1
使用
df['location.display_name'].str.split(',).str[0]
?只有None
或NaN
s出现问题,然后失败。最好的方法是df['new']=df['location.display_name'].str.split(',).str[0]
我编辑了解决方案以检查None
和nan
。