Python 2.7 将列值更改为字符串
例如,我有一个数据帧:Python 2.7 将列值更改为字符串,python-2.7,pandas,Python 2.7,Pandas,例如,我有一个数据帧: df category name 0 [['Clothing & Jewelry', 'Shoes']] Jason 1 [['Clothing & Jewelry', 'Jewelry']] Molly 如何使用逗号分隔条目来存储类别列的字符串 我希望得到的结果是: category name 0
df
category name
0 [['Clothing & Jewelry', 'Shoes']] Jason
1 [['Clothing & Jewelry', 'Jewelry']] Molly
如何使用逗号分隔条目来存储类别
列的字符串
我希望得到的结果是:
category name
0 Clothing & Jewelry, Shoes Jason
1 Clothing & Jewelry, Jewelry Molly
您可以使用
lambda
调用apply
:
In [21]:
df['category'].apply(lambda x: x.remove('Clothing & Jewelry'))
df
Out[21]:
category name
0 [Shoes] Jason
1 [Jewelry] Molly
请注意,在序列中存储非标量值是有问题的,因为过滤和矢量化操作将不起作用,最好使用逗号存储字符串以分隔条目
编辑
为了回答您更新的问题,我将数据元素存储在单独的行中,因为这使筛选更容易:
In [79]:
df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1)
Out[79]:
level_0 0
0 0 Clothing & Jewelry
1 0 Shoes
2 1 Clothing & Jewelry
3 1 Jewelry
然后我们可以将其合并到原始df,然后我们可以过滤:
In[80]:
df.merge(df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1), left_index=True, right_on='level_0', how='left')
Out[80]:
category name level_0 0
0 [[Clothing & Jewelry, Shoes]] Jason 0 Clothing & Jewelry
1 [[Clothing & Jewelry, Shoes]] Jason 0 Shoes
2 [[Clothing & Jewelry, Jewelry]] Molly 1 Clothing & Jewelry
3 [[Clothing & Jewelry, Jewelry]] Molly 1 Jewelry
In [82]:
df = df.drop('level_0', axis=1)
df
Out[82]:
category name 0
0 [[Clothing & Jewelry, Shoes]] Jason Clothing & Jewelry
1 [[Clothing & Jewelry, Shoes]] Jason Shoes
2 [[Clothing & Jewelry, Jewelry]] Molly Clothing & Jewelry
3 [[Clothing & Jewelry, Jewelry]] Molly Jewelry
In [84]:
df.rename(columns={0:'category_values'},inplace=True)
df
Out[84]:
category name category_values
0 [[Clothing & Jewelry, Shoes]] Jason Clothing & Jewelry
1 [[Clothing & Jewelry, Shoes]] Jason Shoes
2 [[Clothing & Jewelry, Jewelry]] Molly Clothing & Jewelry
3 [[Clothing & Jewelry, Jewelry]] Molly Jewelry
In [85]:
df[df['category_values']!='Clothing & Jewelry']
Out[85]:
category name category_values
1 [[Clothing & Jewelry, Shoes]] Jason Shoes
3 [[Clothing & Jewelry, Jewelry]] Molly Jewelry
您可以使用lambda
调用apply
:
In [21]:
df['category'].apply(lambda x: x.remove('Clothing & Jewelry'))
df
Out[21]:
category name
0 [Shoes] Jason
1 [Jewelry] Molly
请注意,在序列中存储非标量值是有问题的,因为过滤和矢量化操作将不起作用,最好使用逗号存储字符串以分隔条目
编辑
为了回答您更新的问题,我将数据元素存储在单独的行中,因为这使筛选更容易:
In [79]:
df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1)
Out[79]:
level_0 0
0 0 Clothing & Jewelry
1 0 Shoes
2 1 Clothing & Jewelry
3 1 Jewelry
然后我们可以将其合并到原始df,然后我们可以过滤:
In[80]:
df.merge(df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1), left_index=True, right_on='level_0', how='left')
Out[80]:
category name level_0 0
0 [[Clothing & Jewelry, Shoes]] Jason 0 Clothing & Jewelry
1 [[Clothing & Jewelry, Shoes]] Jason 0 Shoes
2 [[Clothing & Jewelry, Jewelry]] Molly 1 Clothing & Jewelry
3 [[Clothing & Jewelry, Jewelry]] Molly 1 Jewelry
In [82]:
df = df.drop('level_0', axis=1)
df
Out[82]:
category name 0
0 [[Clothing & Jewelry, Shoes]] Jason Clothing & Jewelry
1 [[Clothing & Jewelry, Shoes]] Jason Shoes
2 [[Clothing & Jewelry, Jewelry]] Molly Clothing & Jewelry
3 [[Clothing & Jewelry, Jewelry]] Molly Jewelry
In [84]:
df.rename(columns={0:'category_values'},inplace=True)
df
Out[84]:
category name category_values
0 [[Clothing & Jewelry, Shoes]] Jason Clothing & Jewelry
1 [[Clothing & Jewelry, Shoes]] Jason Shoes
2 [[Clothing & Jewelry, Jewelry]] Molly Clothing & Jewelry
3 [[Clothing & Jewelry, Jewelry]] Molly Jewelry
In [85]:
df[df['category_values']!='Clothing & Jewelry']
Out[85]:
category name category_values
1 [[Clothing & Jewelry, Shoes]] Jason Shoes
3 [[Clothing & Jewelry, Jewelry]] Molly Jewelry
AttributeError:'str'对象没有属性'remove'
我有这个错误:(你建议把category
列改成字符串吗?我理解正确吗?(对不起,我是初学者)是的,那么你可以做df['category'].str.replace('Clothing&Jewelry','')
或者您可以复制条目,使每个条目都是一个单独的行,然后将其过滤掉。将数据放在一行会使过滤变得困难,并且无法很好地扩展。因此,在这种情况下,每个人将有两个条目,数据放在单独的行上。您可以给出一些提示如何做到这一点?您的数据来自哪里?attributeError:'str'对象没有属性'remove'
我有这个错误:(你建议把category
列改成字符串吗?我理解正确吗?(对不起,我是初学者)是的,那么你可以做df['category'].str.replace('Clothing&Jewelry','')
或者,您可以复制条目,使每个条目都是一个单独的行,然后将其过滤掉。将数据放在一行会使过滤变得困难,并且无法很好地扩展。因此,在这种情况下,每个人将有两个条目,数据放在单独的行上。您可以给出一些提示如何做到这一点?您的数据来自哪里?