Python 如何根据两个数据帧中的ID信息查找值?
包含订单信息的第一个数据帧。潜在客户订单可以有多个OrderID。 另一个dataframe有一个OrderID列表,并希望使用dataframe1作为查找LeadOrderID的引用,如何使用python(Pandas)查找LeadOrderID? 谢谢你的帮助。非常感谢。您应该在=['OrderID']和Python 如何根据两个数据帧中的ID信息查找值?,python,pandas,list,dataframe,boolean,Python,Pandas,List,Dataframe,Boolean,包含订单信息的第一个数据帧。潜在客户订单可以有多个OrderID。 另一个dataframe有一个OrderID列表,并希望使用dataframe1作为查找LeadOrderID的引用,如何使用python(Pandas)查找LeadOrderID? 谢谢你的帮助。非常感谢。您应该在=['OrderID']和how='inner'上与一起使用 In [207]: df1 = pd.DataFrame({'OrderID':[i for i in range(10)], 'Lead Order':
how='inner'
上与一起使用
In [207]: df1 = pd.DataFrame({'OrderID':[i for i in range(10)], 'Lead Order':[1,3,5,8,6,7,7,5,2,1]}, index=[0,1,2,3,4,5,6,7,8,9])
In [208]: df1
Out[208]:
OrderID Lead Order
0 0 1
1 1 3
2 2 5
3 3 8
4 4 6
5 5 7
6 6 7
7 7 5
8 8 2
9 9 1
In [209]: df2 = pd.DataFrame({'OrderID':[3,8,6,2]}, index=[0,1,2,3])
In [210]: df2
Out[210]:
OrderID
0 3
1 8
2 6
3 2
In [211]: df3 = pd.merge(df1, df2, on=['OrderID'], how='inner')
In [212]: df3
Out[212]:
OrderID Lead Order
0 2 5
1 3 8
2 6 7
3 8 2
这个答案包括处理多个问题
OrderID列行中的值
没有注释的完整代码在下面的末尾
# imports
import pandas as pd
import numpy as np
# create sample dataframe
df_orig = \
pd.DataFrame({'OrderID(s)':['0001, 0007, 0002', '0008', '0009, 0005, 0003',],
'Lead Order': ['00011', '00022', '00033']})
德福奥里格
OrderID(s) Lead Order
0 0001, 0007, 0002 00011
1 0008 00022
2 0009, 0005, 0003 00033
-
-
-
-
df_堆栈_ID
OrderID(s) Lead Order
0 0001, 0007, 0002 00011
1 0001, 0007, 0002 00011
2 0001, 0007, 0002 00011
3 0008 00022
4 0009, 0005, 0003 00033
5 0009, 0005, 0003 00033
6 0009, 0005, 0003 00033
OrderID(s) Lead Order OrderID
0 0001, 0007, 0002 00011 0001
1 0001, 0007, 0002 00011 0007
2 0001, 0007, 0002 00011 0002
3 0008 00022 0008
4 0009, 0005, 0003 00033 0009
5 0009, 0005, 0003 00033 0005
6 0009, 0005, 0003 00033 0003
OrderID Lead Order
0 0001 00011
1 0007 00011
2 0002 00011
3 0008 00022
4 0009 00033
5 0005 00033
6 0003 00033
-
df_堆栈_ID
OrderID(s) Lead Order
0 0001, 0007, 0002 00011
1 0001, 0007, 0002 00011
2 0001, 0007, 0002 00011
3 0008 00022
4 0009, 0005, 0003 00033
5 0009, 0005, 0003 00033
6 0009, 0005, 0003 00033
OrderID(s) Lead Order OrderID
0 0001, 0007, 0002 00011 0001
1 0001, 0007, 0002 00011 0007
2 0001, 0007, 0002 00011 0002
3 0008 00022 0008
4 0009, 0005, 0003 00033 0009
5 0009, 0005, 0003 00033 0005
6 0009, 0005, 0003 00033 0003
OrderID Lead Order
0 0001 00011
1 0007 00011
2 0002 00011
3 0008 00022
4 0009 00033
5 0005 00033
6 0003 00033
-
df_堆栈_ID
OrderID(s) Lead Order
0 0001, 0007, 0002 00011
1 0001, 0007, 0002 00011
2 0001, 0007, 0002 00011
3 0008 00022
4 0009, 0005, 0003 00033
5 0009, 0005, 0003 00033
6 0009, 0005, 0003 00033
OrderID(s) Lead Order OrderID
0 0001, 0007, 0002 00011 0001
1 0001, 0007, 0002 00011 0007
2 0001, 0007, 0002 00011 0002
3 0008 00022 0008
4 0009, 0005, 0003 00033 0009
5 0009, 0005, 0003 00033 0005
6 0009, 0005, 0003 00033 0003
OrderID Lead Order
0 0001 00011
1 0007 00011
2 0002 00011
3 0008 00022
4 0009 00033
5 0005 00033
6 0003 00033
-
寻找线索
OrderID
0 0001
1 0002
2 0005
-
df_找到了线索
OrderID Lead Order
0 0001 00011
1 0002 00011
2 0005 00033
-
完整代码:
import pandas as pd
import numpy as np
df_orig = \
pd.DataFrame({'OrderID(s)':['0001, 0007, 0002', '0008', '0009, 0005, 0003',],
'Lead Order': ['00011', '00022', '00033']})
df_orig = df_orig.astype(str)
split_col = df_orig['OrderID(s)'].str.replace(' ', '').str.split(",")
repeats = split_col.str.len().values
orderid_col = np.concatenate(split_col.values)
df_stack_ids = df_orig.iloc[np.repeat(df_orig.index.values, repeats)]. \
reset_index(drop=True)
df_stack_ids['OrderID'] = orderid_col
df_stack_ids = df_stack_ids[['OrderID', 'Lead Order']]
df_stack_ids = df_stack_ids.sort_values(by=['OrderID'])
df_stack_ids.index = range(len(df_stack_ids))
df_find_lead = pd.DataFrame({'OrderID': ['0001', '0002', '0005']})
df_find_lead = df_find_lead.astype(str)
df_found_lead = pd.merge(df_find_lead, df_stack_ids, on=['OrderID'], how='inner')
df_found_lead.astype(int)
@jpp-这个问题不仅仅是关于合并,它还解决了每行有多个值的问题,这必须以某种方式处理。
OrderID Lead Order
0 0001 00011
1 0002 00011
2 0005 00033
# if all original order data is formatted as numbers,
# convert result dataframe back to integers
df_found_lead.astype(int)
OrderID Lead Order
0 1 11
1 2 11
2 5 33
import pandas as pd
import numpy as np
df_orig = \
pd.DataFrame({'OrderID(s)':['0001, 0007, 0002', '0008', '0009, 0005, 0003',],
'Lead Order': ['00011', '00022', '00033']})
df_orig = df_orig.astype(str)
split_col = df_orig['OrderID(s)'].str.replace(' ', '').str.split(",")
repeats = split_col.str.len().values
orderid_col = np.concatenate(split_col.values)
df_stack_ids = df_orig.iloc[np.repeat(df_orig.index.values, repeats)]. \
reset_index(drop=True)
df_stack_ids['OrderID'] = orderid_col
df_stack_ids = df_stack_ids[['OrderID', 'Lead Order']]
df_stack_ids = df_stack_ids.sort_values(by=['OrderID'])
df_stack_ids.index = range(len(df_stack_ids))
df_find_lead = pd.DataFrame({'OrderID': ['0001', '0002', '0005']})
df_find_lead = df_find_lead.astype(str)
df_found_lead = pd.merge(df_find_lead, df_stack_ids, on=['OrderID'], how='inner')
df_found_lead.astype(int)