Python 从两个数据帧按数量循环_Python_Pandas_Dataframe

Python 从两个数据帧按数量循环

python pandas dataframe

Python 从两个数据帧按数量循环,python,pandas,dataframe,Python,Pandas,Dataframe,我有两个数据帧，一个显示买入，另一个显示卖出。我需要为每一个购买的批次提取销售日期。有时，购买是在不同的出售地段出售的，我需要能够为此拆分股票（或者如果不可能，不需要拆分股票，只需拉出卖出日期）。这就是我所拥有的： df1 = pd.DataFrame({'ID': ['AAA','AAA', 'AAA','BBB','CCC'], 'Buydate': ['2017-04-13', '2019-12-31', '2019-03-05', '2018-11-04',

我有两个数据帧，一个显示买入，另一个显示卖出。我需要为每一个购买的批次提取销售日期。有时，购买是在不同的出售地段出售的，我需要能够为此拆分股票（或者如果不可能，不需要拆分股票，只需拉出卖出日期）。这就是我所拥有的：

df1 = pd.DataFrame({'ID': ['AAA','AAA', 'AAA','BBB','CCC'],
               'Buydate': ['2017-04-13', '2019-12-31', '2019-03-05', '2018-11-04', '2019-12-31' ],
               'Quantity': [100.00,  2000.00, 385.95, 214514.00, 63205.00]})
df2=pd.DataFrame({'ID': ['AAA','AAA','BBB'],
               'Selldate': ['2020-01-25', '2020-10-25', '2020-12-19'],
               'Quantity': [500.00,  1985.95, 214714.00]})

输出为：

df1
 ID   |  Buydate |   Quantity
0  AAA  2017-04-13     100.00
1  AAA  2019-12-31    2000.00
2  AAA  2019-03-05     385.95
3  BBB  2018-11-04  214514.00
4  CCC  2019-12-31   63205.00

df2
    ID    Selldate   Quantity
0  AAA  2020-01-25     500.00
1  AAA  2020-10-25    1985.95
2  BBB  2020-12-19  214714.00

首先我添加了cumsum列，然后我计划对df1每组使用一个循环来按ID查找df2，如果共享少于df2中第一批的数量，我使用df1的原始数量，如果它结束，我需要获得剩余数量并继续查找df2的第二批。我想我需要一个concat函数

理想的结果是：

    ID     Buydate   Quantity  SplitQuantity   Selldate
0  AAA  2017-04-13     100.00         100.00 2020-01-25
1  AAA  2019-03-05     385.95         385.95 2020-01-25
2  AAA  2019-12-31    2000.00         14.05  2020-01-25
3  AAA  2019-12-31    2000.00        1985.95 2020-10-25  
4  BBB  2018-11-04  214514.00      214514.00 2020-12-19
5  CCC  2019-12-31   63205.00            NaN        NaT

这个解决方案有点混乱，但您要问的问题有点复杂，所以这里有一个工作原型：

# Sort values by date.
df1 = df1.sort_values(by='Buydate').reset_index()

# id_jumps will be used for ignoring items you already subtracted from. 
id_jump = {}
for id_ in df1['ID']:
    id_jump[id_] = 0

new_index = ['ID', 'Buydate', 'Quantity', 'SplitQuantity', 'Selldate']
new_df = []

# For all items in DF2, subtrack the quantity from items in df1 with the same ID.
for index, row in df2.iterrows():
    sum_ = row['Quantity']
    
    for index2, row2 in df1[df1['ID'] == row['ID']].iterrows():
        if index2 < id_jump[row['ID']]:
            # Skip items already included from previous purchases.
            continue
        if sum_ > row2['Quantity']:
            sub = row2['Quantity']
            sum_ = sum_ - row2['Quantity']
            id_jump[row['ID']] += 1
            new_df.append(
                [row2['ID'], row2['Buydate'], row2['Quantity'], sub, row['Selldate']])
        else:
            id_jump[row['ID']] += 1
            new_df.append(
                [row2['ID'], row2['Buydate'], row2['Quantity'], sum_, row['Selldate']])
            break

df3 = pd.DataFrame(new_df, columns=new_index)

# Add missing 'CCC' row, for IDs never bought.
for id_ in df1['ID']:
    if id_jump[id_] == 0:
       df4 = pd.concat([df3, df1[df1['ID'] == id_]]).drop(columns='index').reset_index()
print(df4)

#     ID     Buydate   Quantity  SplitQuantity    Selldate
# 0  AAA  2017-04-13     100.00         100.00  2020-01-25
# 1  AAA  2019-03-05     385.95         385.95  2020-01-25
# 2  AAA  2019-12-31    2000.00          14.05  2020-01-25
# 3  AAA  2019-12-31    2000.00        1985.95  2020-10-25
# 4  BBB  2018-11-04  214514.00      214514.00  2020-12-19
# 5  CCC  2019-12-31   63205.00            NaN         NaN

#按日期对值排序。
df1=df1.sort_值（by='Buydate'）.reset_索引（）
#id_跳转将用于忽略已减去的项。
id_jump={}
对于df1['id']中的id_uuu）：
id\u跳转[id\u]=0
新索引=['ID'、'Buydate'、'Quantity'、'SplitQuantity'、'Selldate']
新的_df=[]
#对于DF2中的所有项目，子跟踪df1中具有相同ID的项目的数量。
对于索引，df2.iterrows（）中的行：
总和=行[“数量”]
对于index2，df1[df1['ID']==行['ID']]中的行2。iterrows（）：
如果index2第2行[“数量”]：
子行=第2行[“数量”]
总和=总和第2行[“数量”]
id_跳转[行['id']]+=1
新增(
[第2行['ID']，第2行['Buydate']，第2行['Quantity']，子行['Selldate']）
其他：
id_跳转[行['id']]+=1
新增(
[第2行['ID']，第2行['Buydate']，第2行['Quantity']，第2行['Selldate']，第2行['Quantity']，第3行['Selldate']）
打破
df3=pd.DataFrame（新的数据帧，列=新的索引）
#为从未购买的ID添加缺少的“CCC”行。
对于df1['id']中的id_uuu）：
如果id\u跳转[id\u]==0：
df4=pd.concat（[df3，df1[df1['ID']==ID.]]）。drop（columns='index'）。reset_index（）
打印（df4）
#ID Buydate数量拆分数量Selldate
#0 AAA 2017-04-13 100.00 100.00 2020-01-25
#1 AAA 2019-03-05 385.95 385.95 2020-01-25
#2美国汽车协会2019-12-31 2000.00 14.05 2020-01-25
#美国汽车协会2019-12-31 2000.00 1985.95 2020-10-25
#4 BBB 2018-11-04 214514.00 214514.00 2020-12-19
#5 CCC 2019-12-31 63205.00南南

输出中的第二行不应该是2019-03-05年的“购买日期”（以及此后的相应计算）。由于在与2019-12-31的未来购买日期匹配之前，400的未平仓数量将与385.95匹配。您是正确的。我更新了它，谢谢你抓到它。它很有效！我认为它是嵌套循环，但不能写下来。你是天才。非常感谢。