Python 删除列中字符串的一部分_Python_Pandas

Python 删除列中字符串的一部分

python pandas

Python 删除列中字符串的一部分,python,pandas,Python,Pandas,我的数据框中有一列，如下所示：输入 df['location.display_name'] 输出 Kelso, Scottish Borders Manchester, Greater Manchester Northampton, Northamptonshire Reading, Berkshire Leicester, Leicestershire Newport, Wales Swindon, Wiltshire Perth, Perth & Kinross

我的数据框中有一列，如下所示：

输入

df['location.display_name']

输出

 Kelso, Scottish Borders
 Manchester, Greater Manchester
 Northampton, Northamptonshire
 Reading, Berkshire
 Leicester, Leicestershire
 Newport, Wales
 Swindon, Wiltshire
 Perth, Perth & Kinross
 Manchester, Greater Manchester
 Perth, Perth & Kinross
 Cardiff
 Hull, East Riding Of Yorkshire
 Chester, Cheshire
 Southampton
 Leamington Spa, Warwickshire
 Swindon, Wiltshire
 Slough, Berkshire
 Portsmouth, Hampshire

我想创建一个只包含位置第一部分的新列-例如：Swindon，Wiltshire我想保留Swindon并将其添加到新列中

还有，这将如何影响一些我想保留的单词，比如

Cardiff

我认为需要通过

str[0]

选择第一列

列表

s或通过

[0]

选择第一列：

df['new'] = df['location.display_name'].str.split(',').str[0]
#alternative
#df['new'] = df['location.display_name'].str.split(',', expand=True)[0]
print (df)
              location.display_name              new
0           Kelso, Scottish Borders            Kelso
1    Manchester, Greater Manchester       Manchester
2     Northampton, Northamptonshire      Northampton
3                Reading, Berkshire          Reading
4         Leicester, Leicestershire        Leicester
5                    Newport, Wales          Newport
6                Swindon, Wiltshire          Swindon
7            Perth, Perth & Kinross            Perth
8    Manchester, Greater Manchester       Manchester
9            Perth, Perth & Kinross            Perth
10                          Cardiff          Cardiff
11   Hull, East Riding Of Yorkshire             Hull
12                Chester, Cheshire          Chester
13                      Southampton      Southampton
14     Leamington Spa, Warwickshire   Leamington Spa
15               Swindon, Wiltshire          Swindon
16                Slough, Berkshire           Slough
17            Portsmouth, Hampshire       Portsmouth

如果数据中没有

NaN

s和

None

s，则可以使用

列表理解

：

df['new'] = [x.split(',')[0] for x in df['location.display_name']]

要在列的每个元素上执行自定义功能，可以使用pandas

apply

函数。在您的情况下，应使用以下代码执行此操作：

import pandas
import numpy

def get_first_substring(x):
    if (x!=None and x!=numpy.nan):
        return x.split(',')[0]

dataframe['new'] = dataframe['location.display_name'].apply(get_first_substring)

输出将如下所示：

          old                     new
subsstring1, subsstring2      subsstring1

使用

df['location.display_name'].str.split（'，）.str[0]

？只有

None

或

NaN

s出现问题，然后失败。最好的方法是

df['new']=df['location.display_name'].str.split（'，）.str[0]

我编辑了解决方案以检查

None

和

nan

。