Python 将列与字典匹配,以将另一列与这些字典值匹配
我有一个数据帧df,如下所示:Python 将列与字典匹配,以将另一列与这些字典值匹配,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个数据帧df,如下所示: invoice_id|customer_id|items|batch 110|425|{'a': 50, 'b': 46}|no518 994528|a863|{'a': 21, 'c': 25}|as22 24|t311|{'scissor': 6, 'rock': 6}|af10 另一个数据帧df1为: invoice_id|defect 110|a 994528|c 我想在df['items']列中搜索df1['defect']中的值。 这是我
invoice_id|customer_id|items|batch
110|425|{'a': 50, 'b': 46}|no518
994528|a863|{'a': 21, 'c': 25}|as22
24|t311|{'scissor': 6, 'rock': 6}|af10
另一个数据帧df1为:
invoice_id|defect
110|a
994528|c
我想在df['items']列中搜索df1['defect']中的值。
这是我的预期输出:
invoice_id|customer_id|items|batch|defects_in_items
110|425|{'a': 50, 'b': 46}|no518|50
994528|a863|{'a': 21, 'c': 25}|as22|25
24|t311|{'scissor': 6, 'rock': 6}|af10|0
有人能帮忙吗。提前感谢首先,使用
map
创建映射:
mapping = df.invoice_id.map(df1.set_index('invoice_id').defect)
mapping
0 a
1 c
2 NaN
Name: invoice_id, dtype: object
接下来,迭代df['items']
,并使用该行的相应映射值编制索引:
df['defects_in_items'] = [i.get(j, 0) for i, j in zip(df['items'], mapping)]
或者,等效地定义一个函数来执行映射并将其矢量化:
def mapper(i, j):
return i.get(j, 0)
v = np.vectorize(mapper)
df['defects_in_items'] = v(df['items'], mapping)
输出此信息
df
invoice_id customer_id items batch defects_in_items
0 110 425 {'a': 50, 'b': 46} no518 50
1 994528 a863 {'a': 21, 'c': 25} as22 25
2 24 t311 {'scissor': 6, 'rock': 6} af10 0
另一种方式:
# create sample data
df = pd.DataFrame({'invoice_id':[110,994528,24],
'customer_id':['425','a863','t311'],
'citems' :[{'a': 50, 'b': 46},{'a': 21, 'c': 25},{'scissor': 6, 'rock': 6}],
'batch':['no518','as22','af10']})
df2 = pd.DataFrame({'invoice_id':[110,994528], 'defect':['a','c']})
## merge data
df = df.merge(df2, on='invoice_id', how='left').fillna(0)
## iterate over rows and create new column
for index, row in df.iterrows():
if row['defect'] in row['citems']:
df.loc[index, 'defect_in_items'] = df.loc[index, 'citems'].get(df.loc[index, 'defect'],0)
else:
df.loc[index, 'defect_in_items'] = 0
## answer
batch citems customer_id invoice_id defect defect_in_items
0 no518 {'a': 50, 'b': 46} 425. 110 a 50.0
1 as22 {'a': 21, 'c': 25} a863 994528 c 25.0
2 af10 {'scissor': 6, 'rock': 6} t311 24 0 0.0
合并两个数据帧,然后使用apply
import ast
df2 = df.merge(df1, on=["invoice_id"], how="left")
df2["defects_in_items"] = df2.apply(lambda x: ast.literal_eval(x["items"]).get(x["defect"],0), axis=1)
df2.iloc[:,[0,1,2,3,5]]
结果:
invoice_id customer_id items batch defects_in_items
0 110 425 {'a': 50, 'b': 46} no518 50
1 994528 a863 {'a': 21, 'c': 25} as22 25
2 24 t311 {'scissor': 6, 'rock': 6} af10 0
另外,我使用一个txt文件来获取两个数据帧,因此我的“items”列的类型是str,ast.literal\u eval是将列的类型更改为dict。我认为您应该首先查看