Python 如何根据两个数据帧中的ID信息查找值?

Python 如何根据两个数据帧中的ID信息查找值?,python,pandas,list,dataframe,boolean,Python,Pandas,List,Dataframe,Boolean,包含订单信息的第一个数据帧。潜在客户订单可以有多个OrderID。 另一个dataframe有一个OrderID列表,并希望使用dataframe1作为查找LeadOrderID的引用,如何使用python(Pandas)查找LeadOrderID? 谢谢你的帮助。非常感谢。您应该在=['OrderID']和how='inner'上与一起使用 In [207]: df1 = pd.DataFrame({'OrderID':[i for i in range(10)], 'Lead Order':

包含订单信息的第一个数据帧。潜在客户订单可以有多个OrderID。 另一个dataframe有一个OrderID列表,并希望使用dataframe1作为查找LeadOrderID的引用,如何使用python(Pandas)查找LeadOrderID? 谢谢你的帮助。非常感谢。

您应该在=['OrderID']和
how='inner'
上与
一起使用

In [207]: df1 = pd.DataFrame({'OrderID':[i for i in range(10)], 'Lead Order':[1,3,5,8,6,7,7,5,2,1]}, index=[0,1,2,3,4,5,6,7,8,9])

In [208]: df1
Out[208]: 
   OrderID  Lead Order
0        0           1
1        1           3
2        2           5
3        3           8
4        4           6
5        5           7
6        6           7
7        7           5
8        8           2
9        9           1

In [209]: df2 = pd.DataFrame({'OrderID':[3,8,6,2]}, index=[0,1,2,3])

In [210]: df2
Out[210]: 
   OrderID
0        3
1        8
2        6
3        2

In [211]: df3 = pd.merge(df1, df2, on=['OrderID'], how='inner')

In [212]: df3
Out[212]: 
   OrderID  Lead Order
0        2           5
1        3           8
2        6           7
3        8           2

这个答案包括处理多个问题 OrderID列行中的值

没有注释的完整代码在下面的末尾

# imports
import pandas as pd
import numpy as np

# create sample dataframe
df_orig = \
    pd.DataFrame({'OrderID(s)':['0001, 0007, 0002', '0008', '0009, 0005, 0003',],
                  'Lead Order': ['00011', '00022', '00033']})
德福奥里格

          OrderID(s)    Lead Order
0   0001, 0007, 0002    00011
1               0008    00022
2   0009, 0005, 0003    00033
-

-

-

-

df_堆栈_ID

          OrderID(s)    Lead Order
0   0001, 0007, 0002    00011
1   0001, 0007, 0002    00011
2   0001, 0007, 0002    00011
3               0008    00022
4   0009, 0005, 0003    00033
5   0009, 0005, 0003    00033
6   0009, 0005, 0003    00033
          OrderID(s)    Lead Order  OrderID
0   0001, 0007, 0002         00011     0001
1   0001, 0007, 0002         00011     0007
2   0001, 0007, 0002         00011     0002
3               0008         00022     0008
4   0009, 0005, 0003         00033     0009
5   0009, 0005, 0003         00033     0005
6   0009, 0005, 0003         00033     0003
    OrderID Lead Order
0      0001      00011
1      0007      00011
2      0002      00011
3      0008      00022
4      0009      00033
5      0005      00033
6      0003      00033
-

df_堆栈_ID

          OrderID(s)    Lead Order
0   0001, 0007, 0002    00011
1   0001, 0007, 0002    00011
2   0001, 0007, 0002    00011
3               0008    00022
4   0009, 0005, 0003    00033
5   0009, 0005, 0003    00033
6   0009, 0005, 0003    00033
          OrderID(s)    Lead Order  OrderID
0   0001, 0007, 0002         00011     0001
1   0001, 0007, 0002         00011     0007
2   0001, 0007, 0002         00011     0002
3               0008         00022     0008
4   0009, 0005, 0003         00033     0009
5   0009, 0005, 0003         00033     0005
6   0009, 0005, 0003         00033     0003
    OrderID Lead Order
0      0001      00011
1      0007      00011
2      0002      00011
3      0008      00022
4      0009      00033
5      0005      00033
6      0003      00033
-

df_堆栈_ID

          OrderID(s)    Lead Order
0   0001, 0007, 0002    00011
1   0001, 0007, 0002    00011
2   0001, 0007, 0002    00011
3               0008    00022
4   0009, 0005, 0003    00033
5   0009, 0005, 0003    00033
6   0009, 0005, 0003    00033
          OrderID(s)    Lead Order  OrderID
0   0001, 0007, 0002         00011     0001
1   0001, 0007, 0002         00011     0007
2   0001, 0007, 0002         00011     0002
3               0008         00022     0008
4   0009, 0005, 0003         00033     0009
5   0009, 0005, 0003         00033     0005
6   0009, 0005, 0003         00033     0003
    OrderID Lead Order
0      0001      00011
1      0007      00011
2      0002      00011
3      0008      00022
4      0009      00033
5      0005      00033
6      0003      00033
-

寻找线索

    OrderID
0      0001
1      0002
2      0005
-

df_找到了线索

    OrderID Lead Order
0      0001      00011
1      0002      00011
2      0005      00033
-

完整代码:

import pandas as pd
import numpy as np

df_orig = \
    pd.DataFrame({'OrderID(s)':['0001, 0007, 0002', '0008', '0009, 0005, 0003',],
                  'Lead Order': ['00011', '00022', '00033']})

df_orig = df_orig.astype(str)
split_col = df_orig['OrderID(s)'].str.replace(' ', '').str.split(",")

repeats = split_col.str.len().values
orderid_col = np.concatenate(split_col.values)

df_stack_ids = df_orig.iloc[np.repeat(df_orig.index.values, repeats)]. \
    reset_index(drop=True)

df_stack_ids['OrderID'] = orderid_col
df_stack_ids = df_stack_ids[['OrderID', 'Lead Order']]
df_stack_ids = df_stack_ids.sort_values(by=['OrderID'])
df_stack_ids.index = range(len(df_stack_ids))

df_find_lead = pd.DataFrame({'OrderID': ['0001', '0002', '0005']})
df_find_lead = df_find_lead.astype(str)

df_found_lead = pd.merge(df_find_lead, df_stack_ids, on=['OrderID'], how='inner')
df_found_lead.astype(int)

@jpp-这个问题不仅仅是关于合并,它还解决了每行有多个值的问题,这必须以某种方式处理。
    OrderID Lead Order
0      0001      00011
1      0002      00011
2      0005      00033
# if all original order data is formatted as numbers,
# convert result dataframe back to integers
df_found_lead.astype(int)

    OrderID Lead Order
0         1         11
1         2         11
2         5         33
import pandas as pd
import numpy as np

df_orig = \
    pd.DataFrame({'OrderID(s)':['0001, 0007, 0002', '0008', '0009, 0005, 0003',],
                  'Lead Order': ['00011', '00022', '00033']})

df_orig = df_orig.astype(str)
split_col = df_orig['OrderID(s)'].str.replace(' ', '').str.split(",")

repeats = split_col.str.len().values
orderid_col = np.concatenate(split_col.values)

df_stack_ids = df_orig.iloc[np.repeat(df_orig.index.values, repeats)]. \
    reset_index(drop=True)

df_stack_ids['OrderID'] = orderid_col
df_stack_ids = df_stack_ids[['OrderID', 'Lead Order']]
df_stack_ids = df_stack_ids.sort_values(by=['OrderID'])
df_stack_ids.index = range(len(df_stack_ids))

df_find_lead = pd.DataFrame({'OrderID': ['0001', '0002', '0005']})
df_find_lead = df_find_lead.astype(str)

df_found_lead = pd.merge(df_find_lead, df_stack_ids, on=['OrderID'], how='inner')
df_found_lead.astype(int)