Python 2.7 将列值更改为字符串

Python 2.7 将列值更改为字符串,python-2.7,pandas,Python 2.7,Pandas,例如,我有一个数据帧: df category name 0 [['Clothing & Jewelry', 'Shoes']] Jason 1 [['Clothing & Jewelry', 'Jewelry']] Molly 如何使用逗号分隔条目来存储类别列的字符串 我希望得到的结果是: category name 0

例如,我有一个数据帧:

df

    category                              name
0   [['Clothing & Jewelry', 'Shoes']]     Jason
1   [['Clothing & Jewelry', 'Jewelry']]   Molly
如何使用逗号分隔条目来存储
类别
列的字符串

我希望得到的结果是:

    category                              name
0   Clothing & Jewelry, Shoes             Jason
1   Clothing & Jewelry, Jewelry           Molly

您可以使用
lambda
调用
apply

In [21]:
df['category'].apply(lambda x: x.remove('Clothing & Jewelry'))
df

Out[21]:
    category   name
0    [Shoes]  Jason
1  [Jewelry]  Molly
请注意,在序列中存储非标量值是有问题的,因为过滤和矢量化操作将不起作用,最好使用逗号存储字符串以分隔条目

编辑

为了回答您更新的问题,我将数据元素存储在单独的行中,因为这使筛选更容易:

In [79]:
df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1)

Out[79]:
   level_0                   0
0        0  Clothing & Jewelry
1        0               Shoes
2        1  Clothing & Jewelry
3        1             Jewelry
然后我们可以
将其合并到原始df,然后我们可以过滤:

In[80]:
df.merge(df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1), left_index=True, right_on='level_0', how='left')

Out[80]:
                          category   name  level_0                   0
0    [[Clothing & Jewelry, Shoes]]  Jason        0  Clothing & Jewelry
1    [[Clothing & Jewelry, Shoes]]  Jason        0               Shoes
2  [[Clothing & Jewelry, Jewelry]]  Molly        1  Clothing & Jewelry
3  [[Clothing & Jewelry, Jewelry]]  Molly        1             Jewelry

In [82]:
df = df.drop('level_0', axis=1)
df

Out[82]:
                          category   name                   0
0    [[Clothing & Jewelry, Shoes]]  Jason  Clothing & Jewelry
1    [[Clothing & Jewelry, Shoes]]  Jason               Shoes
2  [[Clothing & Jewelry, Jewelry]]  Molly  Clothing & Jewelry
3  [[Clothing & Jewelry, Jewelry]]  Molly             Jewelry

In [84]:    
df.rename(columns={0:'category_values'},inplace=True)
df

Out[84]:
                          category   name     category_values
0    [[Clothing & Jewelry, Shoes]]  Jason  Clothing & Jewelry
1    [[Clothing & Jewelry, Shoes]]  Jason               Shoes
2  [[Clothing & Jewelry, Jewelry]]  Molly  Clothing & Jewelry
3  [[Clothing & Jewelry, Jewelry]]  Molly             Jewelry

In [85]:
df[df['category_values']!='Clothing & Jewelry']

Out[85]:
                          category   name category_values
1    [[Clothing & Jewelry, Shoes]]  Jason           Shoes
3  [[Clothing & Jewelry, Jewelry]]  Molly         Jewelry

您可以使用
lambda
调用
apply

In [21]:
df['category'].apply(lambda x: x.remove('Clothing & Jewelry'))
df

Out[21]:
    category   name
0    [Shoes]  Jason
1  [Jewelry]  Molly
请注意,在序列中存储非标量值是有问题的,因为过滤和矢量化操作将不起作用,最好使用逗号存储字符串以分隔条目

编辑

为了回答您更新的问题,我将数据元素存储在单独的行中,因为这使筛选更容易:

In [79]:
df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1)

Out[79]:
   level_0                   0
0        0  Clothing & Jewelry
1        0               Shoes
2        1  Clothing & Jewelry
3        1             Jewelry
然后我们可以
将其合并到原始df,然后我们可以过滤:

In[80]:
df.merge(df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1), left_index=True, right_on='level_0', how='left')

Out[80]:
                          category   name  level_0                   0
0    [[Clothing & Jewelry, Shoes]]  Jason        0  Clothing & Jewelry
1    [[Clothing & Jewelry, Shoes]]  Jason        0               Shoes
2  [[Clothing & Jewelry, Jewelry]]  Molly        1  Clothing & Jewelry
3  [[Clothing & Jewelry, Jewelry]]  Molly        1             Jewelry

In [82]:
df = df.drop('level_0', axis=1)
df

Out[82]:
                          category   name                   0
0    [[Clothing & Jewelry, Shoes]]  Jason  Clothing & Jewelry
1    [[Clothing & Jewelry, Shoes]]  Jason               Shoes
2  [[Clothing & Jewelry, Jewelry]]  Molly  Clothing & Jewelry
3  [[Clothing & Jewelry, Jewelry]]  Molly             Jewelry

In [84]:    
df.rename(columns={0:'category_values'},inplace=True)
df

Out[84]:
                          category   name     category_values
0    [[Clothing & Jewelry, Shoes]]  Jason  Clothing & Jewelry
1    [[Clothing & Jewelry, Shoes]]  Jason               Shoes
2  [[Clothing & Jewelry, Jewelry]]  Molly  Clothing & Jewelry
3  [[Clothing & Jewelry, Jewelry]]  Molly             Jewelry

In [85]:
df[df['category_values']!='Clothing & Jewelry']

Out[85]:
                          category   name category_values
1    [[Clothing & Jewelry, Shoes]]  Jason           Shoes
3  [[Clothing & Jewelry, Jewelry]]  Molly         Jewelry

AttributeError:'str'对象没有属性'remove'
我有这个错误:(你建议把
category
列改成字符串吗?我理解正确吗?(对不起,我是初学者)是的,那么你可以做
df['category'].str.replace('Clothing&Jewelry','')
或者您可以复制条目,使每个条目都是一个单独的行,然后将其过滤掉。将数据放在一行会使过滤变得困难,并且无法很好地扩展。因此,在这种情况下,每个人将有两个条目,数据放在单独的行上。您可以给出一些提示如何做到这一点?您的数据来自哪里?
attributeError:'str'对象没有属性'remove'
我有这个错误:(你建议把
category
列改成字符串吗?我理解正确吗?(对不起,我是初学者)是的,那么你可以做
df['category'].str.replace('Clothing&Jewelry','')
或者,您可以复制条目,使每个条目都是一个单独的行,然后将其过滤掉。将数据放在一行会使过滤变得困难,并且无法很好地扩展。因此,在这种情况下,每个人将有两个条目,数据放在单独的行上。您可以给出一些提示如何做到这一点?您的数据来自哪里?