Python 大括号和单引号_Python_Pandas

Python 大括号和单引号

python pandas

Python 大括号和单引号,python,pandas,Python,Pandas,我有一个数据框，如下所示，我想删除方括号、单引号（'）和逗号 id currentTitle1 1 ['@@@0000070642@@@'] 2 ['@@@0000082569@@@'] 3 ['@@@0000082569@@@'] 4 ['@@@0000082569@@@'] 5 ['@@@0000060910@@@', '@@@0000039198@@@'] 6 ['@@@0000060910@@@'] 7 ['@@@0000129849@@@'] 8 ['

我有一个数据框，如下所示，我想删除方括号、单引号（'）和逗号

id  currentTitle1
1   ['@@@0000070642@@@']
2   ['@@@0000082569@@@']
3   ['@@@0000082569@@@']
4   ['@@@0000082569@@@']
5   ['@@@0000060910@@@', '@@@0000039198@@@']
6   ['@@@0000060910@@@']
7   ['@@@0000129849@@@']
8   ['@@@0000082569@@@']
9   ['@@@0000082569@@@', '@@@0000060905@@@', '@@@0000086889@@@']
10  ['@@@0000082569@@@']

我希望输出如下

id  currentTitle1
1   @@@0000070642@@@
2   @@@0000082569@@@
3   @@@0000082569@@@
4   @@@0000082569@@@
5   @@@0000060910@@@ @@@0000039198@@@
6   @@@0000060910@@@
7   @@@0000129849@@@
8   @@@0000082569@@@
9   @@@0000082569@@@ @@@0000060905@@@ @@@0000086889@@@
10  @@@0000082569@@@

我正在从正则表达式清理操作中获取数据，格式为

df['currentTitle']=df['currentTitle'].str.findall（r'@{3}\d+@‌{3} ）

编辑：发布不干净的数据。请记住，还有空行没有包括在内

id  currentTitle    currentTitle_unclean
1   @@@0000070642@@@    accompanying functions of @@@0000070642@@@ and business risk assessment - director
2   @@@0000082569@@@    account @@@0000082569@@@ - sales agent /representative at pronovias fashion group
3   @@@0000082569@@@    account manager/product @@@0000082569@@@ - handbags and accessories
4   @@@0000082569@@@    account @@@0000082569@@@ for entrepreneurs and small size companies
5   @@@0000060910@@@ @@@0000039198@@@   academic @@@0000060910@@@ , administrative, and @@@0000039198@@@ liaison coordinator
6   @@@0000060910@@@    account executive at bluefin insurance @@@0000060910@@@ limited
7   @@@0000129849@@@    account executive for interior @@@0000129849@@@ magazine inex
8   @@@0000082569@@@    account @@@0000082569@@@ high potential secondment programme
9   @@@0000082569@@@ @@@0000060905@@@ @@@0000086889@@@  account @@@0000082569@@@ @@@0000060905@@@ -energy and commodities @@@0000086889@@@ candidate
10  @@@0000082569@@@    account @@@0000082569@@@ paints, coatings, adhesives - ser, slo, cro

您可以与

join

一起使用：

df['currentTitle1'] = df['currentTitle1'].apply(' '.join)    
print (df)
   id      currentTitle                               currentTitle_unclean  \
0   1  @@@0000070642@@@  accompanying functions of @@@0000070642@@@ and...   
1   2  @@@0000082569@@@  account @@@0000082569@@@ - sales agent /repres...   
2   3  @@@0000082569@@@  account manager/product @@@0000082569@@@ - han...   
3   4  @@@0000082569@@@  account @@@0000082569@@@ for entrepreneurs and...   
4   5  @@@0000060910@@@  @@@0000039198@@@   academic @@@0000060910@@@ ,...   
5   6  @@@0000060910@@@  account executive at bluefin insurance @@@0000...   
6   7  @@@0000129849@@@  account executive for interior @@@0000129849@@...   
7   8  @@@0000082569@@@  account @@@0000082569@@@ high potential second...   
8   9  @@@0000082569@@@  @@@0000060905@@@ @@@0000086889@@@  account @@@...   
9  10  @@@0000082569@@@  account @@@0000082569@@@ paints, coatings, adh...   

                                       currentTitle1  
0                                   @@@0000070642@@@  
1                                   @@@0000082569@@@  
2                                   @@@0000082569@@@  
3                                   @@@0000082569@@@  
4  @@@0000039198@@@ @@@0000060910@@@ @@@000003919...  
5                                   @@@0000060910@@@  
6                                   @@@0000129849@@@  
7                                   @@@0000082569@@@  
8  @@@0000060905@@@ @@@0000086889@@@ @@@000008256...  
9                                   @@@0000082569@@@

或如前所述：

如果出现错误：

TypeError:只能加入一个iterable

如果不列出原始值，则可以添加条件：

df['currentTitle1'] = df['currentTitle1'].apply(lambda x: ' '.join(x) if type(x) == list 
                                                                      else x)

或创建空字符串：

df['currentTitle1'] = df['currentTitle1'].apply(lambda x: ' '.join(x) if type(x) == list 
                                                                      else '')

您可以与

join

一起使用：

df['currentTitle1'] = df['currentTitle1'].apply(' '.join)    
print (df)
   id      currentTitle                               currentTitle_unclean  \
0   1  @@@0000070642@@@  accompanying functions of @@@0000070642@@@ and...   
1   2  @@@0000082569@@@  account @@@0000082569@@@ - sales agent /repres...   
2   3  @@@0000082569@@@  account manager/product @@@0000082569@@@ - han...   
3   4  @@@0000082569@@@  account @@@0000082569@@@ for entrepreneurs and...   
4   5  @@@0000060910@@@  @@@0000039198@@@   academic @@@0000060910@@@ ,...   
5   6  @@@0000060910@@@  account executive at bluefin insurance @@@0000...   
6   7  @@@0000129849@@@  account executive for interior @@@0000129849@@...   
7   8  @@@0000082569@@@  account @@@0000082569@@@ high potential second...   
8   9  @@@0000082569@@@  @@@0000060905@@@ @@@0000086889@@@  account @@@...   
9  10  @@@0000082569@@@  account @@@0000082569@@@ paints, coatings, adh...   

                                       currentTitle1  
0                                   @@@0000070642@@@  
1                                   @@@0000082569@@@  
2                                   @@@0000082569@@@  
3                                   @@@0000082569@@@  
4  @@@0000039198@@@ @@@0000060910@@@ @@@000003919...  
5                                   @@@0000060910@@@  
6                                   @@@0000129849@@@  
7                                   @@@0000082569@@@  
8  @@@0000060905@@@ @@@0000086889@@@ @@@000008256...  
9                                   @@@0000082569@@@

或如前所述：

如果出现错误：

TypeError:只能加入一个iterable

如果不列出原始值，则可以添加条件：

df['currentTitle1'] = df['currentTitle1'].apply(lambda x: ' '.join(x) if type(x) == list 
                                                                      else x)

或创建空字符串：

df['currentTitle1'] = df['currentTitle1'].apply(lambda x: ' '.join(x) if type(x) == list 
                                                                      else '')

这也适用于我的机器，创建了

dataframe

：

import pandas as pd
import re

data = ['accompanying functions of @@@0000070642@@@ and business risk assessment - director',
'account @@@0000082569@@@ - sales agent /representative at pronovias fashion group',
'account manager/product @@@0000082569@@@ - handbags and accessories',
'account @@@0000082569@@@ for entrepreneurs and small size companies',
'academic @@@0000060910@@@ , administrative, and @@@0000039198@@@ liaison coordinator',
'account executive at bluefin insurance @@@0000060910@@@ limited',
'account executive for interior @@@0000129849@@@ magazine inex',
'account @@@0000082569@@@ high potential secondment programme',
'account @@@0000082569@@@ @@@0000060905@@@ -energy and commodities @@@0000086889@@@ candidate',
'account @@@0000082569@@@ paints, coatings, adhesives - ser, slo, cro']

df = pd.DataFrame({'currentTitle_unclean': data})
df['currentTitle'] = df['currentTitle_unclean'].apply(lambda x: ' '.join(re.findall(r'@{3}\d+@{3}', x)))

这也适用于我的机器，创建了

dataframe

：

import pandas as pd
import re

data = ['accompanying functions of @@@0000070642@@@ and business risk assessment - director',
'account @@@0000082569@@@ - sales agent /representative at pronovias fashion group',
'account manager/product @@@0000082569@@@ - handbags and accessories',
'account @@@0000082569@@@ for entrepreneurs and small size companies',
'academic @@@0000060910@@@ , administrative, and @@@0000039198@@@ liaison coordinator',
'account executive at bluefin insurance @@@0000060910@@@ limited',
'account executive for interior @@@0000129849@@@ magazine inex',
'account @@@0000082569@@@ high potential secondment programme',
'account @@@0000082569@@@ @@@0000060905@@@ -energy and commodities @@@0000086889@@@ candidate',
'account @@@0000082569@@@ paints, coatings, adhesives - ser, slo, cro']

df = pd.DataFrame({'currentTitle_unclean': data})
df['currentTitle'] = df['currentTitle_unclean'].apply(lambda x: ' '.join(re.findall(r'@{3}\d+@{3}', x)))

如果列表：

df['currentTitle1'].map（lambda x:''.join（x））

，或

['''.join（x）for x in df['currentTitle1']]

…它们是列表还是字符串？我从一个正则表达式获得了值，我假设它是一个列表。不过我不确定。当我检查dtype时，它显示为'object'df['currentTitle']=df['currentTitle'].str.findall（r'@{3}\d+@{3}'），看起来应该包括在问题中，以及regexp操作之前的原始数据。也许最后甚至不需要删除标点。@IljaEverilä添加了详细信息。如果列表：

df['currentTitle1'].map（lambda x:'.join（x））

，或者

[''.join（x）for df['currentTitle1']

…它们是列表还是字符串？我从一个正则表达式中得到了值，我假设它是一个列表。不过我不确定。当我检查dtype时，它显示为'object'df['currentTitle']=df['currentTitle'].str.findall（r'@{3}\d+@{3}'），看起来应该包括在问题中，以及regexp操作之前的原始数据。也许最后根本不需要删除标点。@IljaEverilä添加了细节。在这两个例子中，我都得到了错误类型错误：只能加入一个iterableIt，这意味着有一些

None

或

NaN

值，所以需要

df['currentTitle1']=df['currentTitle1']。首先合并（pd.Series（[[]，index=df.index））.apply（“”.join）

新代码仍会出现相同的错误，是的，有空格。确定，代码正常。这是一个愚蠢的NaN值，导致了我在调试了近3个小时后发现的错误。谢谢你的帮助！！在这两个例子中，我都得到了错误类型错误：只能加入一个iterableIt意味着有一些

None

或

NaN

值，因此需要

df['currentTitle1']=df['currentTitle1']=df['currentTitle1']。首先合并（pd.Series（[[[[[]]，index=df.index））。应用（'.join）

仍然用新代码得到相同的错误，是的，还有空格可以，你的代码成功了。这是一个愚蠢的NaN值，导致了我在调试了近3个小时后发现的错误。谢谢你的帮助！！