Python 将列与字典匹配,以将另一列与这些字典值匹配

Python 将列与字典匹配,以将另一列与这些字典值匹配,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据帧df,如下所示: invoice_id|customer_id|items|batch 110|425|{'a': 50, 'b': 46}|no518 994528|a863|{'a': 21, 'c': 25}|as22 24|t311|{'scissor': 6, 'rock': 6}|af10 另一个数据帧df1为: invoice_id|defect 110|a 994528|c 我想在df['items']列中搜索df1['defect']中的值。 这是我

我有一个数据帧df,如下所示:

invoice_id|customer_id|items|batch

110|425|{'a': 50, 'b': 46}|no518

994528|a863|{'a': 21, 'c': 25}|as22

24|t311|{'scissor': 6, 'rock': 6}|af10
另一个数据帧df1为:

invoice_id|defect

110|a

994528|c
我想在df['items']列中搜索df1['defect']中的值。 这是我的预期输出:

invoice_id|customer_id|items|batch|defects_in_items

110|425|{'a': 50, 'b': 46}|no518|50

994528|a863|{'a': 21, 'c': 25}|as22|25

24|t311|{'scissor': 6, 'rock': 6}|af10|0

有人能帮忙吗。提前感谢

首先,使用
map
创建映射:

mapping = df.invoice_id.map(df1.set_index('invoice_id').defect)

mapping
0      a
1      c
2    NaN
Name: invoice_id, dtype: object
接下来,迭代
df['items']
,并使用该行的相应
映射值编制索引:

df['defects_in_items'] = [i.get(j, 0) for i, j in zip(df['items'], mapping)]     
或者,等效地定义一个函数来执行映射并将其矢量化:

def mapper(i, j):
    return i.get(j, 0)

v = np.vectorize(mapper)
df['defects_in_items'] = v(df['items'], mapping)
输出此信息

df

   invoice_id customer_id                      items  batch  defects_in_items
0         110         425         {'a': 50, 'b': 46}  no518                50
1      994528        a863         {'a': 21, 'c': 25}   as22                25
2          24        t311  {'scissor': 6, 'rock': 6}   af10                 0
另一种方式:

# create sample data
df = pd.DataFrame({'invoice_id':[110,994528,24],
                   'customer_id':['425','a863','t311'],
                   'citems' :[{'a': 50, 'b': 46},{'a': 21, 'c': 25},{'scissor': 6, 'rock': 6}],
                  'batch':['no518','as22','af10']})

df2 = pd.DataFrame({'invoice_id':[110,994528], 'defect':['a','c']})

## merge data
df = df.merge(df2, on='invoice_id', how='left').fillna(0)

## iterate over rows and create new column
for index, row in df.iterrows():
    if row['defect'] in row['citems']:
        df.loc[index, 'defect_in_items'] = df.loc[index, 'citems'].get(df.loc[index, 'defect'],0)
    else:
        df.loc[index, 'defect_in_items'] = 0

## answer

    batch         citems               customer_id  invoice_id  defect  defect_in_items
0   no518   {'a': 50, 'b': 46}            425.          110         a       50.0
1   as22    {'a': 21, 'c': 25}            a863        994528        c       25.0
2   af10    {'scissor': 6, 'rock': 6}     t311          24          0       0.0

合并两个数据帧,然后使用apply

import ast
df2 = df.merge(df1, on=["invoice_id"], how="left")
df2["defects_in_items"] = df2.apply(lambda x: ast.literal_eval(x["items"]).get(x["defect"],0), axis=1)
df2.iloc[:,[0,1,2,3,5]]
结果:

    invoice_id  customer_id items                      batch    defects_in_items
0   110         425         {'a': 50, 'b': 46}         no518    50
1   994528      a863        {'a': 21, 'c': 25}         as22     25
2   24          t311        {'scissor': 6, 'rock': 6}  af10     0

另外,我使用一个txt文件来获取两个数据帧,因此我的“items”列的类型是str,ast.literal\u eval是将列的类型更改为dict。

我认为您应该首先查看