Python 2.7 将列值更改为字符串_Python 2.7_Pandas

Python 2.7 将列值更改为字符串

python-2.7 pandas

Python 2.7 将列值更改为字符串,python-2.7,pandas,Python 2.7,Pandas,例如，我有一个数据帧： df category name 0 [['Clothing & Jewelry', 'Shoes']] Jason 1 [['Clothing & Jewelry', 'Jewelry']] Molly 如何使用逗号分隔条目来存储类别列的字符串我希望得到的结果是： category name 0

例如，我有一个数据帧：

df

    category                              name
0   [['Clothing & Jewelry', 'Shoes']]     Jason
1   [['Clothing & Jewelry', 'Jewelry']]   Molly

如何使用逗号分隔条目来存储

类别

列的字符串

我希望得到的结果是：

    category                              name
0   Clothing & Jewelry, Shoes             Jason
1   Clothing & Jewelry, Jewelry           Molly

您可以使用

lambda

调用

apply

：

In [21]:
df['category'].apply(lambda x: x.remove('Clothing & Jewelry'))
df

Out[21]:
    category   name
0    [Shoes]  Jason
1  [Jewelry]  Molly

请注意，在序列中存储非标量值是有问题的，因为过滤和矢量化操作将不起作用，最好使用逗号存储字符串以分隔条目

编辑

为了回答您更新的问题，我将数据元素存储在单独的行中，因为这使筛选更容易：

In [79]:
df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1)

Out[79]:
   level_0                   0
0        0  Clothing & Jewelry
1        0               Shoes
2        1  Clothing & Jewelry
3        1             Jewelry

然后我们可以

将其合并到原始df，然后我们可以过滤：
In[80]:
df.merge(df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1), left_index=True, right_on='level_0', how='left')

Out[80]:
                          category   name  level_0                   0
0    [[Clothing & Jewelry, Shoes]]  Jason        0  Clothing & Jewelry
1    [[Clothing & Jewelry, Shoes]]  Jason        0               Shoes
2  [[Clothing & Jewelry, Jewelry]]  Molly        1  Clothing & Jewelry
3  [[Clothing & Jewelry, Jewelry]]  Molly        1             Jewelry

In [82]:
df = df.drop('level_0', axis=1)
df

Out[82]:
                          category   name                   0
0    [[Clothing & Jewelry, Shoes]]  Jason  Clothing & Jewelry
1    [[Clothing & Jewelry, Shoes]]  Jason               Shoes
2  [[Clothing & Jewelry, Jewelry]]  Molly  Clothing & Jewelry
3  [[Clothing & Jewelry, Jewelry]]  Molly             Jewelry

In [84]:    
df.rename(columns={0:'category_values'},inplace=True)
df

Out[84]:
                          category   name     category_values
0    [[Clothing & Jewelry, Shoes]]  Jason  Clothing & Jewelry
1    [[Clothing & Jewelry, Shoes]]  Jason               Shoes
2  [[Clothing & Jewelry, Jewelry]]  Molly  Clothing & Jewelry
3  [[Clothing & Jewelry, Jewelry]]  Molly             Jewelry

In [85]:
df[df['category_values']!='Clothing & Jewelry']

Out[85]:
                          category   name category_values
1    [[Clothing & Jewelry, Shoes]]  Jason           Shoes
3  [[Clothing & Jewelry, Jewelry]]  Molly         Jewelry

您可以使用lambda
调用apply
：
In [21]:
df['category'].apply(lambda x: x.remove('Clothing & Jewelry'))
df

Out[21]:
    category   name
0    [Shoes]  Jason
1  [Jewelry]  Molly

请注意，在序列中存储非标量值是有问题的，因为过滤和矢量化操作将不起作用，最好使用逗号存储字符串以分隔条目
编辑
为了回答您更新的问题，我将数据元素存储在单独的行中，因为这使筛选更容易：
In [79]:
df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1)

Out[79]:
   level_0                   0
0        0  Clothing & Jewelry
1        0               Shoes
2        1  Clothing & Jewelry
3        1             Jewelry

然后我们可以将其合并到原始df，然后我们可以过滤：
In[80]:
df.merge(df['category'].apply(lambda x: ','.join(x[0])).str.split(',',expand=True).stack().reset_index().drop('level_1', axis=1), left_index=True, right_on='level_0', how='left')

Out[80]:
                          category   name  level_0                   0
0    [[Clothing & Jewelry, Shoes]]  Jason        0  Clothing & Jewelry
1    [[Clothing & Jewelry, Shoes]]  Jason        0               Shoes
2  [[Clothing & Jewelry, Jewelry]]  Molly        1  Clothing & Jewelry
3  [[Clothing & Jewelry, Jewelry]]  Molly        1             Jewelry

In [82]:
df = df.drop('level_0', axis=1)
df

Out[82]:
                          category   name                   0
0    [[Clothing & Jewelry, Shoes]]  Jason  Clothing & Jewelry
1    [[Clothing & Jewelry, Shoes]]  Jason               Shoes
2  [[Clothing & Jewelry, Jewelry]]  Molly  Clothing & Jewelry
3  [[Clothing & Jewelry, Jewelry]]  Molly             Jewelry

In [84]:    
df.rename(columns={0:'category_values'},inplace=True)
df

Out[84]:
                          category   name     category_values
0    [[Clothing & Jewelry, Shoes]]  Jason  Clothing & Jewelry
1    [[Clothing & Jewelry, Shoes]]  Jason               Shoes
2  [[Clothing & Jewelry, Jewelry]]  Molly  Clothing & Jewelry
3  [[Clothing & Jewelry, Jewelry]]  Molly             Jewelry

In [85]:
df[df['category_values']!='Clothing & Jewelry']

Out[85]:
                          category   name category_values
1    [[Clothing & Jewelry, Shoes]]  Jason           Shoes
3  [[Clothing & Jewelry, Jewelry]]  Molly         Jewelry

AttributeError:'str'对象没有属性'remove'
我有这个错误：（你建议把category
列改成字符串吗？我理解正确吗？（对不起，我是初学者）是的，那么你可以做df['category'].str.replace（'Clothing&Jewelry'，''）
或者您可以复制条目，使每个条目都是一个单独的行，然后将其过滤掉。将数据放在一行会使过滤变得困难，并且无法很好地扩展。因此，在这种情况下，每个人将有两个条目，数据放在单独的行上。您可以给出一些提示如何做到这一点？您的数据来自哪里？attributeError:'str'对象没有属性'remove'
我有这个错误：（你建议把category
列改成字符串吗？我理解正确吗？（对不起，我是初学者）是的，那么你可以做df['category'].str.replace（'Clothing&Jewelry'，''）
或者，您可以复制条目，使每个条目都是一个单独的行，然后将其过滤掉。将数据放在一行会使过滤变得困难，并且无法很好地扩展。因此，在这种情况下，每个人将有两个条目，数据放在单独的行上。您可以给出一些提示如何做到这一点？您的数据来自哪里？