Python 大括号和单引号
我有一个数据框,如下所示,我想删除方括号、单引号(')和逗号Python 大括号和单引号,python,pandas,Python,Pandas,我有一个数据框,如下所示,我想删除方括号、单引号(')和逗号 id currentTitle1 1 ['@@@0000070642@@@'] 2 ['@@@0000082569@@@'] 3 ['@@@0000082569@@@'] 4 ['@@@0000082569@@@'] 5 ['@@@0000060910@@@', '@@@0000039198@@@'] 6 ['@@@0000060910@@@'] 7 ['@@@0000129849@@@'] 8 ['
id currentTitle1
1 ['@@@0000070642@@@']
2 ['@@@0000082569@@@']
3 ['@@@0000082569@@@']
4 ['@@@0000082569@@@']
5 ['@@@0000060910@@@', '@@@0000039198@@@']
6 ['@@@0000060910@@@']
7 ['@@@0000129849@@@']
8 ['@@@0000082569@@@']
9 ['@@@0000082569@@@', '@@@0000060905@@@', '@@@0000086889@@@']
10 ['@@@0000082569@@@']
我希望输出如下
id currentTitle1
1 @@@0000070642@@@
2 @@@0000082569@@@
3 @@@0000082569@@@
4 @@@0000082569@@@
5 @@@0000060910@@@ @@@0000039198@@@
6 @@@0000060910@@@
7 @@@0000129849@@@
8 @@@0000082569@@@
9 @@@0000082569@@@ @@@0000060905@@@ @@@0000086889@@@
10 @@@0000082569@@@
我正在从正则表达式清理操作中获取数据,格式为df['currentTitle']=df['currentTitle'].str.findall(r'@{3}\d+@{3} )
编辑:发布不干净的数据。请记住,还有空行没有包括在内
id currentTitle currentTitle_unclean
1 @@@0000070642@@@ accompanying functions of @@@0000070642@@@ and business risk assessment - director
2 @@@0000082569@@@ account @@@0000082569@@@ - sales agent /representative at pronovias fashion group
3 @@@0000082569@@@ account manager/product @@@0000082569@@@ - handbags and accessories
4 @@@0000082569@@@ account @@@0000082569@@@ for entrepreneurs and small size companies
5 @@@0000060910@@@ @@@0000039198@@@ academic @@@0000060910@@@ , administrative, and @@@0000039198@@@ liaison coordinator
6 @@@0000060910@@@ account executive at bluefin insurance @@@0000060910@@@ limited
7 @@@0000129849@@@ account executive for interior @@@0000129849@@@ magazine inex
8 @@@0000082569@@@ account @@@0000082569@@@ high potential secondment programme
9 @@@0000082569@@@ @@@0000060905@@@ @@@0000086889@@@ account @@@0000082569@@@ @@@0000060905@@@ -energy and commodities @@@0000086889@@@ candidate
10 @@@0000082569@@@ account @@@0000082569@@@ paints, coatings, adhesives - ser, slo, cro
您可以与join
一起使用:
df['currentTitle1'] = df['currentTitle1'].apply(' '.join)
print (df)
id currentTitle currentTitle_unclean \
0 1 @@@0000070642@@@ accompanying functions of @@@0000070642@@@ and...
1 2 @@@0000082569@@@ account @@@0000082569@@@ - sales agent /repres...
2 3 @@@0000082569@@@ account manager/product @@@0000082569@@@ - han...
3 4 @@@0000082569@@@ account @@@0000082569@@@ for entrepreneurs and...
4 5 @@@0000060910@@@ @@@0000039198@@@ academic @@@0000060910@@@ ,...
5 6 @@@0000060910@@@ account executive at bluefin insurance @@@0000...
6 7 @@@0000129849@@@ account executive for interior @@@0000129849@@...
7 8 @@@0000082569@@@ account @@@0000082569@@@ high potential second...
8 9 @@@0000082569@@@ @@@0000060905@@@ @@@0000086889@@@ account @@@...
9 10 @@@0000082569@@@ account @@@0000082569@@@ paints, coatings, adh...
currentTitle1
0 @@@0000070642@@@
1 @@@0000082569@@@
2 @@@0000082569@@@
3 @@@0000082569@@@
4 @@@0000039198@@@ @@@0000060910@@@ @@@000003919...
5 @@@0000060910@@@
6 @@@0000129849@@@
7 @@@0000082569@@@
8 @@@0000060905@@@ @@@0000086889@@@ @@@000008256...
9 @@@0000082569@@@
或如前所述:
如果出现错误:
TypeError:只能加入一个iterable
如果不列出原始值,则可以添加条件:
df['currentTitle1'] = df['currentTitle1'].apply(lambda x: ' '.join(x) if type(x) == list
else x)
或创建空字符串:
df['currentTitle1'] = df['currentTitle1'].apply(lambda x: ' '.join(x) if type(x) == list
else '')
您可以与join
一起使用:
df['currentTitle1'] = df['currentTitle1'].apply(' '.join)
print (df)
id currentTitle currentTitle_unclean \
0 1 @@@0000070642@@@ accompanying functions of @@@0000070642@@@ and...
1 2 @@@0000082569@@@ account @@@0000082569@@@ - sales agent /repres...
2 3 @@@0000082569@@@ account manager/product @@@0000082569@@@ - han...
3 4 @@@0000082569@@@ account @@@0000082569@@@ for entrepreneurs and...
4 5 @@@0000060910@@@ @@@0000039198@@@ academic @@@0000060910@@@ ,...
5 6 @@@0000060910@@@ account executive at bluefin insurance @@@0000...
6 7 @@@0000129849@@@ account executive for interior @@@0000129849@@...
7 8 @@@0000082569@@@ account @@@0000082569@@@ high potential second...
8 9 @@@0000082569@@@ @@@0000060905@@@ @@@0000086889@@@ account @@@...
9 10 @@@0000082569@@@ account @@@0000082569@@@ paints, coatings, adh...
currentTitle1
0 @@@0000070642@@@
1 @@@0000082569@@@
2 @@@0000082569@@@
3 @@@0000082569@@@
4 @@@0000039198@@@ @@@0000060910@@@ @@@000003919...
5 @@@0000060910@@@
6 @@@0000129849@@@
7 @@@0000082569@@@
8 @@@0000060905@@@ @@@0000086889@@@ @@@000008256...
9 @@@0000082569@@@
或如前所述:
如果出现错误:
TypeError:只能加入一个iterable
如果不列出原始值,则可以添加条件:
df['currentTitle1'] = df['currentTitle1'].apply(lambda x: ' '.join(x) if type(x) == list
else x)
或创建空字符串:
df['currentTitle1'] = df['currentTitle1'].apply(lambda x: ' '.join(x) if type(x) == list
else '')
这也适用于我的机器,创建了
dataframe
:
import pandas as pd
import re
data = ['accompanying functions of @@@0000070642@@@ and business risk assessment - director',
'account @@@0000082569@@@ - sales agent /representative at pronovias fashion group',
'account manager/product @@@0000082569@@@ - handbags and accessories',
'account @@@0000082569@@@ for entrepreneurs and small size companies',
'academic @@@0000060910@@@ , administrative, and @@@0000039198@@@ liaison coordinator',
'account executive at bluefin insurance @@@0000060910@@@ limited',
'account executive for interior @@@0000129849@@@ magazine inex',
'account @@@0000082569@@@ high potential secondment programme',
'account @@@0000082569@@@ @@@0000060905@@@ -energy and commodities @@@0000086889@@@ candidate',
'account @@@0000082569@@@ paints, coatings, adhesives - ser, slo, cro']
df = pd.DataFrame({'currentTitle_unclean': data})
df['currentTitle'] = df['currentTitle_unclean'].apply(lambda x: ' '.join(re.findall(r'@{3}\d+@{3}', x)))
这也适用于我的机器,创建了
dataframe
:
import pandas as pd
import re
data = ['accompanying functions of @@@0000070642@@@ and business risk assessment - director',
'account @@@0000082569@@@ - sales agent /representative at pronovias fashion group',
'account manager/product @@@0000082569@@@ - handbags and accessories',
'account @@@0000082569@@@ for entrepreneurs and small size companies',
'academic @@@0000060910@@@ , administrative, and @@@0000039198@@@ liaison coordinator',
'account executive at bluefin insurance @@@0000060910@@@ limited',
'account executive for interior @@@0000129849@@@ magazine inex',
'account @@@0000082569@@@ high potential secondment programme',
'account @@@0000082569@@@ @@@0000060905@@@ -energy and commodities @@@0000086889@@@ candidate',
'account @@@0000082569@@@ paints, coatings, adhesives - ser, slo, cro']
df = pd.DataFrame({'currentTitle_unclean': data})
df['currentTitle'] = df['currentTitle_unclean'].apply(lambda x: ' '.join(re.findall(r'@{3}\d+@{3}', x)))
如果列表:
df['currentTitle1'].map(lambda x:''.join(x))
,或['''.join(x)for x in df['currentTitle1']]
…它们是列表还是字符串?我从一个正则表达式获得了值,我假设它是一个列表。不过我不确定。当我检查dtype时,它显示为'object'df['currentTitle']=df['currentTitle'].str.findall(r'@{3}\d+@{3}'),看起来应该包括在问题中,以及regexp操作之前的原始数据。也许最后甚至不需要删除标点。@IljaEverilä添加了详细信息。如果列表:df['currentTitle1'].map(lambda x:'.join(x))
,或者[''.join(x)for df['currentTitle1']
…它们是列表还是字符串?我从一个正则表达式中得到了值,我假设它是一个列表。不过我不确定。当我检查dtype时,它显示为'object'df['currentTitle']=df['currentTitle'].str.findall(r'@{3}\d+@{3}'),看起来应该包括在问题中,以及regexp操作之前的原始数据。也许最后根本不需要删除标点。@IljaEverilä添加了细节。在这两个例子中,我都得到了错误类型错误:只能加入一个iterableIt,这意味着有一些None
或NaN
值,所以需要df['currentTitle1']=df['currentTitle1']。首先合并(pd.Series([[],index=df.index)).apply(“”.join)
新代码仍会出现相同的错误,是的,有空格。确定,代码正常。这是一个愚蠢的NaN值,导致了我在调试了近3个小时后发现的错误。谢谢你的帮助!!在这两个例子中,我都得到了错误类型错误:只能加入一个iterableIt意味着有一些None
或NaN
值,因此需要df['currentTitle1']=df['currentTitle1']=df['currentTitle1']。首先合并(pd.Series([[[[[]],index=df.index))。应用('.join)
仍然用新代码得到相同的错误,是的,还有空格可以,你的代码成功了。这是一个愚蠢的NaN值,导致了我在调试了近3个小时后发现的错误。谢谢你的帮助!!