Python 如何为timeseries数据帧添加行?

Python 如何为timeseries数据帧添加行?,python,pandas,dataframe,Python,Pandas,Dataframe,我正在编写一个程序,将timeseries excel文件加载到数据框中,然后使用一些基本计算创建几个新列。我的程序有时会读取excel文件,其中一些记录缺少几个月。所以在下面的例子中,我有两个不同商店的月度销售数据。这些商店在不同的月份营业,因此它们的第一个月结束日期会有所不同。但在2020年9月30日之前,两家公司都应该有月末数据。在我的档案中,BBB商店没有2020年8月31日和2020年9月30日的记录,因为这两个月没有销售 商场 月初 陈述 城市 月底日期 销售额 AAA 5/31/2

我正在编写一个程序,将timeseries excel文件加载到数据框中,然后使用一些基本计算创建几个新列。我的程序有时会读取excel文件,其中一些记录缺少几个月。所以在下面的例子中,我有两个不同商店的月度销售数据。这些商店在不同的月份营业,因此它们的第一个月结束日期会有所不同。但在2020年9月30日之前,两家公司都应该有月末数据。在我的档案中,BBB商店没有2020年8月31日和2020年9月30日的记录,因为这两个月没有销售

商场 月初 陈述 城市 月底日期 销售额 AAA 5/31/2020 纽约 纽约 5/31/2020 1000 AAA 5/31/2020 纽约 纽约 6/30/2020 5000 AAA 5/31/2020 纽约 纽约 7/30/2020 3000 AAA 5/31/2020 纽约 纽约 8/31/2020 4000 AAA 5/31/2020 纽约 纽约 9/30/2020 2000 BBB 6/30/2020 计算机断层扫描 哈特福德 6/30/2020 100 BBB 6/30/2020 计算机断层扫描 哈特福德 7/30/2020 200
  • 只需尝试日期时间索引的
    upsample
    。参考:
  • 请注意:
    7/30/2020
    不是7月底<代码>2020年7月31日。因此,使用此方法将是一个问题(将月末日期转换为真正的月末日期)

  • 下面是一步一步的方法。如果你有问题,请告诉我

    import pandas as pd
    pd.set_option('display.max_columns', None)
    c = ['Store','Month Opened','State','City','Month End Date','Sales']
    d = [['AAA','5/31/2020','NY','New York','5/31/2020',1000],
    ['AAA','5/31/2020','NY','New York','6/30/2020',5000],
    ['AAA','5/31/2020','NY','New York','7/30/2020',3000],
    ['AAA','5/31/2020','NY','New York','8/31/2020',4000],
    ['AAA','5/31/2020','NY','New York','9/30/2020',2000],
    ['BBB','6/30/2020','CT','Hartford','6/30/2020',100],
    ['BBB','6/30/2020','CT','Hartford','7/30/2020',200],
    ['CCC','3/31/2020','NJ','Cranbury','3/31/2020',1500]]
    
    df = pd.DataFrame(d,columns = c)
    df['Month Opened'] = pd.to_datetime(df['Month Opened'])
    df['Month End Date'] = pd.to_datetime(df['Month End Date'])
    
    #select last entry for each Store
    df1 = df.sort_values('Month End Date').drop_duplicates('Store', keep='last').copy()
    
    #delete all rows that have 2020-09-30. We want only ones that are less than 2020-09-30
    df1 = df1[df1['Month End Date'] != '2020-09-30']
    
    #set target end date to 2020-09-30
    df1['Target_End_Date'] = pd.to_datetime ('2020-09-30')
    
    #calculate how many rows to repeat
    df1['repeats'] = df1['Target_End_Date'].dt.to_period('M').astype(int) - df1['Month End Date'].dt.to_period('M').astype(int)
    
    #add 1 month to month end so we can start repeating from here
    df1['Month End Date'] = df1['Month End Date'] + pd.DateOffset(months =1)
    
    #set sales value as 0 per requirement
    df1['Sales'] = 0
    
    #repeat each row by the value in column repeats
    df1 = df1.loc[df1.index.repeat(df1.repeats)].reset_index(drop=True)
    
    #reset repeats to start from 0 thru n using groupby cumcouunt
    #this will be used to calculate months to increment from month end date
    df1['repeats'] = df1.groupby('Store').cumcount()
    
    #update month end date based on value in repeats
    df1['Month End Date'] = df1.apply(lambda x: x['Month End Date'] + pd.DateOffset(months = x['repeats']), axis=1)
    
    #set end date to last day of the month
    df1['Month End Date'] = pd.to_datetime(df1['Month End Date']) + pd.offsets.MonthEnd(0)
    
    #drop columns that we don't need anymore. required before we concat dfs
    df1.drop(columns=['Target_End_Date','repeats'],inplace=True)
    
    #concat df and df1 to get the final dataframe
    df = pd.concat([df, df1], ignore_index=True)
    
    #sort values by Store and Month End Date
    df = df.sort_values(by=['Store','Month End Date'],ignore_index=True)
    
    print (df)
    
    其输出为:

       Store Month Opened State      City Month End Date  Sales
    0    AAA   2020-05-31    NY  New York     2020-05-31   1000
    1    AAA   2020-05-31    NY  New York     2020-06-30   5000
    2    AAA   2020-05-31    NY  New York     2020-07-30   3000
    3    AAA   2020-05-31    NY  New York     2020-08-31   4000
    4    AAA   2020-05-31    NY  New York     2020-09-30   2000
    5    BBB   2020-06-30    CT  Hartford     2020-06-30    100
    6    BBB   2020-06-30    CT  Hartford     2020-07-30    200
    7    BBB   2020-06-30    CT  Hartford     2020-08-30      0
    8    BBB   2020-06-30    CT  Hartford     2020-09-30      0
    9    CCC   2020-03-31    NJ  Cranbury     2020-03-31   1500
    10   CCC   2020-03-31    NJ  Cranbury     2020-04-30      0
    11   CCC   2020-03-31    NJ  Cranbury     2020-05-31      0
    12   CCC   2020-03-31    NJ  Cranbury     2020-06-30      0
    13   CCC   2020-03-31    NJ  Cranbury     2020-07-31      0
    14   CCC   2020-03-31    NJ  Cranbury     2020-08-31      0
    15   CCC   2020-03-31    NJ  Cranbury     2020-09-30      0
    
    注:我又添加了一个带有CCC的条目,以显示更多的变化

       Store Month Opened State      City Month End Date  Sales
    0    AAA   2020-05-31    NY  New York     2020-05-31   1000
    1    AAA   2020-05-31    NY  New York     2020-06-30   5000
    2    AAA   2020-05-31    NY  New York     2020-07-30   3000
    3    AAA   2020-05-31    NY  New York     2020-08-31   4000
    4    AAA   2020-05-31    NY  New York     2020-09-30   2000
    5    BBB   2020-06-30    CT  Hartford     2020-06-30    100
    6    BBB   2020-06-30    CT  Hartford     2020-07-30    200
    7    BBB   2020-06-30    CT  Hartford     2020-08-30      0
    8    BBB   2020-06-30    CT  Hartford     2020-09-30      0
    9    CCC   2020-03-31    NJ  Cranbury     2020-03-31   1500
    10   CCC   2020-03-31    NJ  Cranbury     2020-04-30      0
    11   CCC   2020-03-31    NJ  Cranbury     2020-05-31      0
    12   CCC   2020-03-31    NJ  Cranbury     2020-06-30      0
    13   CCC   2020-03-31    NJ  Cranbury     2020-07-31      0
    14   CCC   2020-03-31    NJ  Cranbury     2020-08-31      0
    15   CCC   2020-03-31    NJ  Cranbury     2020-09-30      0