Python—从包含值列表的字典中添加具有映射值的新列_Python_Pandas_Dictionary_Mapping_Lookup

Python—从包含值列表的字典中添加具有映射值的新列

python pandas dictionary mapping

Python—从包含值列表的字典中添加具有映射值的新列,python,pandas,dictionary,mapping,lookup,Python,Pandas,Dictionary,Mapping,Lookup,我正在尝试从映射字典向数据帧添加至少一列，甚至多列。我有一本产品目录编号的字典，其中包含该产品编号的标准化分层命名法列表。下面的例子 dict = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']} df = pd.DataFrame( {"product": [1, 2, 3]}) df['catagory'] = df['product'].map(dict) print(df) 我得到以下结果： product cata

我正在尝试从映射字典向数据帧添加至少一列，甚至多列。我有一本产品目录编号的字典，其中包含该产品编号的标准化分层命名法列表。下面的例子

dict = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']}
df = pd.DataFrame( {"product": [1, 2, 3]})
df['catagory'] = df['product'].map(dict)
print(df)

我得到以下结果：

    product      catagory
0        1  [a, b, c, d]
1        2  [w, x, y, z]
2        3           NaN

我希望获得以下资料：

     product     cat1     cat2     cat3     cat4
0       1          a        b       c         d
1       2          w        x       y         z
2       3         NaN      NaN     NaN       NaN

或者更好：

     product     category
0       1           d
1       2           z
2       3         NaN

我一直在尝试解析字典中列表中的一个项，并将其附加到dataframe，但根据本文，我只找到了映射列表中包含一个项的字典的建议

谢谢你的帮助

让我们使用

设置索引

，

应用

，

添加前缀

，

重置索引

：

df_out = (df.set_index('product')['catagory']
  .apply(lambda x:pd.Series(x)))

df_out.columns = df_out.columns + 1

df_out.add_prefix('cat').reset_index()

输出：

   product cat1 cat2 cat3 cat4
0        1    a    b    c    d
1        2    w    x    y    z
2        3  NaN  NaN  NaN  NaN

   product category
0        1        a
1        1        b
2        1        c
3        1        d
4        2        w
5        2        x
6        2        y
7        2        z
8        3      NaN

到下一个

更好的setp:
(df.set_index('product')['catagory']
  .apply(lambda x:pd.Series(x))
  .stack(dropna=False)
  .rename('category')
  .reset_index()
  .drop('level_1',axis=1)
  .drop_duplicates()
)

输出：
   product cat1 cat2 cat3 cat4
0        1    a    b    c    d
1        2    w    x    y    z
2        3  NaN  NaN  NaN  NaN

   product category
0        1        a
1        1        b
2        1        c
3        1        d
4        2        w
5        2        x
6        2        y
7        2        z
8        3      NaN

让我们使用设置索引
，应用
，添加前缀
，重置索引
：
df_out = (df.set_index('product')['catagory']
  .apply(lambda x:pd.Series(x)))

df_out.columns = df_out.columns + 1

df_out.add_prefix('cat').reset_index()

输出：
   product cat1 cat2 cat3 cat4
0        1    a    b    c    d
1        2    w    x    y    z
2        3  NaN  NaN  NaN  NaN

   product category
0        1        a
1        1        b
2        1        c
3        1        d
4        2        w
5        2        x
6        2        y
7        2        z
8        3      NaN

到下一个更好的setp:
(df.set_index('product')['catagory']
  .apply(lambda x:pd.Series(x))
  .stack(dropna=False)
  .rename('category')
  .reset_index()
  .drop('level_1',axis=1)
  .drop_duplicates()
)

输出：
   product cat1 cat2 cat3 cat4
0        1    a    b    c    d
1        2    w    x    y    z
2        3  NaN  NaN  NaN  NaN

   product category
0        1        a
1        1        b
2        1        c
3        1        d
4        2        w
5        2        x
6        2        y
7        2        z
8        3      NaN

注意：
切勿使用保留字，如列表
，类型
，dict
。。。作为变量，因为屏蔽了内置函数
因此，如果使用：
#dict is variable name
dict = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']}
#create dictionary is not possible, because dict is dictionary
print (dict(a=1, b=2))
{'a': 1, 'b': 2}

获取错误：
TypeError:“dict”对象不可调用
调试是非常复杂的。（测试后重新启动IDE）
因此，请使用另一个变量，如d
或categories
：
d = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']}
print (dict(a=1, b=2))
{'a': 1, 'b': 2}


我认为你需要：
然后是可能的用途或：

如果列category
位于df
中，则解决方案类似，只需通过以下方式删除带有NaN
的行：
注意：
切勿使用保留字，如列表
，类型
，dict
。。。作为变量，因为屏蔽了内置函数
因此，如果使用：
#dict is variable name
dict = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']}
#create dictionary is not possible, because dict is dictionary
print (dict(a=1, b=2))
{'a': 1, 'b': 2}

获取错误：
TypeError:“dict”对象不可调用
调试是非常复杂的。（测试后重新启动IDE）
因此，请使用另一个变量，如d
或categories
：
d = {1: ['a', 'b', 'c', 'd'], 2: ['w', 'x', 'y', 'z']}
print (dict(a=1, b=2))
{'a': 1, 'b': 2}


我认为你需要：
然后是可能的用途或：

如果列category
位于df
中，则解决方案类似，只需通过以下方式删除带有NaN
的行：
这可能有帮助：这可能有帮助：