Python 基于2个数据帧的列生成
我有一个数据帧df1,它有:Python 基于2个数据帧的列生成,python,pandas,numpy,Python,Pandas,Numpy,我有一个数据帧df1,它有: F_Id I_Code F_Date FT-56832 2 01/09/2019 FT-93828 1 01/09/2019 FT-13853 2 02/09/2019 FT-18858 3 02/09/2019 FT-19010 2 03/09/2019 FT-62064 5 02/09/2019 FT-94494 4 03/09/2019 FT-73594 2 03/09/2
F_Id I_Code F_Date
FT-56832 2 01/09/2019
FT-93828 1 01/09/2019
FT-13853 2 02/09/2019
FT-18858 3 02/09/2019
FT-19010 2 03/09/2019
FT-62064 5 02/09/2019
FT-94494 4 03/09/2019
FT-73594 2 03/09/2019
FT-78590 3 01/09/2019
FT-14296 4 01/09/2019
FT-82529 3 03/09/2019
FT-33266 3 04/09/2019
FT-58456 4 02/09/2019
FT-16693 4 04/09/2019
FT-69073 4 02/09/2019
FT-69649 1 05/09/2019
每个(I_代码、F_日期)都有5个不同的ID与之关联
我有另一个dataframe df2,它有以下列:
F_Date num_i_found
01/09/2019 5
01/09/2019 3
02/09/2019 5
02/09/2019 5
03/09/2019 3
02/09/2019 4
03/09/2019 4
03/09/2019 5
01/09/2019 5
01/09/2019 4
03/09/2019 3
04/09/2019 5
02/09/2019 4
04/09/2019 5
02/09/2019 4
05/09/2019 4
我想生成一个在df2中找到的新列ID_,这样它就是一个ID为的数组
例如,对于2019年9月1日,num_i_found是4,那么ID_found将是df1中5个ID中的4个。(FT-56832,FT-93828,F-78590,…)
有没有办法达到同样的效果 创建列表字典,并按
num\u i\u found
值编制索引进行筛选:
注意:如果值与第一行不匹配,那么样本数据中的值仅为2019年9月1日的4
值,我猜在实际数据中,每个日期时间都是d
中的5个值,所以按照您的需要工作
d = df1.groupby('F_Date')['F_Id'].apply(list).to_dict()
print (d)
{'01/09/2019': ['FT-56832', 'FT-93828', 'FT-78590', 'FT-14296'],
'02/09/2019': ['FT-13853', 'FT-18858', 'FT-62064', 'FT-58456', 'FT-69073'],
'03/09/2019': ['FT-19010', 'FT-94494', 'FT-73594', 'FT-82529'],
'04/09/2019': ['FT-33266', 'FT-16693'],
'05/09/2019': ['FT-69649']}
如果需要字符串:
d = df1.groupby('F_Date')['F_Id'].apply(list).to_dict()
df2['new'] = df2.apply(lambda x: ', '.join(d.get(x['F_Date'], [])[:x['num_i_found']]), axis=1)
print (df2)
F_Date num_i_found new
0 01/09/2019 5 FT-56832, FT-93828, FT-78590, FT-14296
1 01/09/2019 3 FT-56832, FT-93828, FT-78590
2 02/09/2019 5 FT-13853, FT-18858, FT-62064, FT-58456, FT-69073
3 02/09/2019 5 FT-13853, FT-18858, FT-62064, FT-58456, FT-69073
4 03/09/2019 3 FT-19010, FT-94494, FT-73594
5 02/09/2019 4 FT-13853, FT-18858, FT-62064, FT-58456
6 03/09/2019 4 FT-19010, FT-94494, FT-73594, FT-82529
7 03/09/2019 5 FT-19010, FT-94494, FT-73594, FT-82529
8 01/09/2019 5 FT-56832, FT-93828, FT-78590, FT-14296
9 01/09/2019 4 FT-56832, FT-93828, FT-78590, FT-14296
10 03/09/2019 3 FT-19010, FT-94494, FT-73594
11 04/09/2019 5 FT-33266, FT-16693
12 02/09/2019 4 FT-13853, FT-18858, FT-62064, FT-58456
13 04/09/2019 5 FT-33266, FT-16693
14 02/09/2019 4 FT-13853, FT-18858, FT-62064, FT-58456
15 05/09/2019 4 FT-69649
我想在df2中获得新专栏。i、 e.F_date,num_i_found,new如果num_i_found是5,那么从df1中选择所有ID并追加到数组中并添加到新的column@user3759616-oops,我将其分配给df1,而不是df2,这是不正确的。与第一个条目(即2019年9月1日)一样,截至数据共享,df1中有4个条目,但实际上有5个,这会选择5,然后分配给列'new'@user3759616-刚刚添加的通知;)
d = df1.groupby('F_Date')['F_Id'].apply(list).to_dict()
df2['new'] = df2.apply(lambda x: ', '.join(d.get(x['F_Date'], [])[:x['num_i_found']]), axis=1)
print (df2)
F_Date num_i_found new
0 01/09/2019 5 FT-56832, FT-93828, FT-78590, FT-14296
1 01/09/2019 3 FT-56832, FT-93828, FT-78590
2 02/09/2019 5 FT-13853, FT-18858, FT-62064, FT-58456, FT-69073
3 02/09/2019 5 FT-13853, FT-18858, FT-62064, FT-58456, FT-69073
4 03/09/2019 3 FT-19010, FT-94494, FT-73594
5 02/09/2019 4 FT-13853, FT-18858, FT-62064, FT-58456
6 03/09/2019 4 FT-19010, FT-94494, FT-73594, FT-82529
7 03/09/2019 5 FT-19010, FT-94494, FT-73594, FT-82529
8 01/09/2019 5 FT-56832, FT-93828, FT-78590, FT-14296
9 01/09/2019 4 FT-56832, FT-93828, FT-78590, FT-14296
10 03/09/2019 3 FT-19010, FT-94494, FT-73594
11 04/09/2019 5 FT-33266, FT-16693
12 02/09/2019 4 FT-13853, FT-18858, FT-62064, FT-58456
13 04/09/2019 5 FT-33266, FT-16693
14 02/09/2019 4 FT-13853, FT-18858, FT-62064, FT-58456
15 05/09/2019 4 FT-69649