Pandas 如果列包含字典值中的任何值，则添加字典键作为标签_Pandas_Dictionary

Pandas 如果列包含字典值中的任何值，则添加字典键作为标签

pandas dictionary

Pandas 如果列包含字典值中的任何值，则添加字典键作为标签,pandas,dictionary,Pandas,Dictionary,我有这样一个数据帧： df = pd.DataFrame({'products' : ['a,b,c', 'a,c', 'b,d','a,b,c']}) products 0 a,b,c 1 a,c 2 b,d 3 a,b,c mydict = {'a': ['good', 'neutral'], 'b': ['neutral'], 'c': ['neutral'], 'd': ['bad']} 我还创建了

我有这样一个数据帧：

df = pd.DataFrame({'products' : ['a,b,c', 'a,c', 'b,d','a,b,c']})

    products
0   a,b,c
1   a,c
2   b,d
3   a,b,c

mydict = {'a': ['good', 'neutral'],
          'b': ['neutral'],
          'c': ['neutral'],
          'd': ['bad']}

我还创建了一个字典，将特定产品映射到特定类别：

mydict = {'good':['a'],'bad':['d'],'neutral':['b','c','a']}

我正在尝试创建一个新列，比如说

df['quality']

，如果

df['products']

中的任何产品包含在该特定键的值中，那么它将添加字典键（产品类别）。即，最终输出应如下所示：

    products quality
0   a,b,c     good, neutral   
1   a,c       good, neutral
2   b,d       neutral, bad
3   a,b,c     good, neutral

有什么想法吗？我把问题复杂化了吗？

您可以先生成一个反向字典，将类别映射到产品，例如

a->[好的，中性的]

。然后将

df

中的值拆分成

，

，

分解成并使用此反向dict映射它们。然后使用groupby
和set
在展平的列表产品上收集它们，最后将它们与，
合并：
from collections import defaultdict
from itertools import chain

# form the dictionary
reversed_dict = defaultdict(list)
[reversed_dict[cat].append(prod) for prod, categs in mydict.items()
                                 for cat in categs]

# apply over the df
df["quality"] = (df.products
                   .str.split(",")
                   .explode()
                   .map(reversed_dict)
                   .groupby(level=0)
                   .agg(lambda s: ", ".join(set(chain.from_iterable(s)))))

得到
>>> df

  products        quality
0    a,b,c  good, neutral
1      a,c  good, neutral
2      b,d   bad, neutral
3    a,b,c  good, neutral

让我们试试
help = pd.Series(mydict).explode().reset_index().groupby(0)['index'].agg(','.join)

df['quality'] = df.products.replace(help,regex=True).str.split(',').map(set).str.join(',')
Out[150]: 
0    good,neutral
1    good,neutral
2     bad,neutral
3    good,neutral
Name: products, dtype: object

您应该这样定义mydict：
df = pd.DataFrame({'products' : ['a,b,c', 'a,c', 'b,d','a,b,c']})

    products
0   a,b,c
1   a,c
2   b,d
3   a,b,c

mydict = {'a': ['good', 'neutral'],
          'b': ['neutral'],
          'c': ['neutral'],
          'd': ['bad']}

然后：
返回：
    products    quality
0   a,b,c       good,neutral
1   a,c         good,neutral
2   b,d         bad,neutral
3   a,b,c       good,neutral

还有一种方法：
d = {'a': ['good', 'neutral'],
          'b': ['neutral'],
          'c': ['neutral'],
          'd': ['bad']}

df['quality'] = df['products'].str.split(',').explode().map(d).explode().groupby(level=0).unique().str.join(',')

a
既“好”又“中性”是故意的吗？是的，同一产品可以分为不同的类别。在这种情况下，我需要添加每个类别。您可能已经完成了defaultdict（set）
，然后只使用，'.join（set.union（*s））
，而不需要额外的导入。不过，我不确定运行时间。。。。