Database 在SAS中创建从上次输入日期到指定日期的虚拟行

Database 在SAS中创建从上次输入日期到指定日期的虚拟行,database,syntax,sas,Database,Syntax,Sas,假设我有以下SAS数据集: Account Month Balance LastMonth MonthDate LastMonthDate 1 Jan 5 May 2012-01-01 2012-05-01 1 Feb 2 May 2012-02-01 2012-05-01 1 Mar 1 May 2012-03-01 201

假设我有以下SAS数据集:

Account  Month  Balance  LastMonth   MonthDate    LastMonthDate
1        Jan    5        May         2012-01-01  2012-05-01
1        Feb    2        May         2012-02-01  2012-05-01
1        Mar    1        May         2012-03-01  2012-05-01
2        Feb    6        Apr         2012-02-01  2012-04-01
2        Mar    4        Apr         2012-03-01  2012-04-01
我需要创建以下内容:

Account  Month  Balance  LastMonth   MonthDate    LastMonthDate
1        Jan    5        May         2012-01-01  2012-05-01
1        Feb    2        May         2012-02-01  2012-05-01
1        Mar    1        May         2012-03-01  2012-05-01
1        Apr    1        May         2012-04-01  2012-05-01
1        May    1        May         2012-05-01  2012-05-01
2        Feb    6        Apr         2012-02-01  2012-04-01
2        Mar    4        Apr         2012-03-01  2012-04-01
2        Apr    4        Apr         2012-04-01  2012-04-01
也就是说,我需要为每个帐户添加额外的行,以便每个帐户在“LastMonth”列之前每个月都有一个条目。对于不在原始数据集中的月份,余额必须与数据集中最后一个条目的余额保持不变。我的数据集已按“帐户”和“月份”排序

请注意,这只是两个示例帐户,因为我的real dataset有多个帐户,每个帐户都有不同的“LastMonth”列。我需要对该过程进行概括,以便为每个帐户创建截至其“LastMonth”日期的缺失行数

编辑:“MonthDate”和“LastMonthDate”存储如下:

您需要检查您是否在帐户的最后一行(要求数据按帐户排序)。然后将字符串月份转换为数字,在它们之间迭代,并输出新的月份名称

编辑

根据评论,此数据步骤将处理您的数据。保留旧答案以获取更多信息:

data want;
set have;
by account;

output;

if last.account then do;
    /*Current month as a number*/
    month_n = month(MonthDate);
    /*LastMonth as a number*/
    to_month = month(LastMonthDate);

    do i=month_n+1 to to_month;
        month = put(mdy(i,1,2000),monname3.); /*Increment the month and write the month name*/
        output;
    end;
end;

drop month_n to_month i;
run;
结束编辑

不幸的是,SAS没有一个简单的格式或信息在月份之间转换为字符串和数字。因此,在这里,我使用
month()
函数生成日期并提取月号:

data want;
set have;
by account;

output;

if last.account then do;
    /*Current month as a number*/
    month_n = month(input(catt("01",strip(month),"2000"),date9.));
    /*LastMonth as a number*/
    to_month = month(input(catt("01",lastMonth,"2000"),date9.));

    do i=month_n+1 to to_month;
        month = put(mdy(i,1,2000),monname3.); /*Increment the month and write the month name*/
        output;
    end;
end;

drop month_n to_month i;
run;
您可以为转换创建自己的格式和信息。这将使代码更简洁

proc format;
value MName 1="Jan"
             2="Feb"
             3="Mar"
             4="Apr"
             5="May"
             6="Jun"
             7="Jul"
             8="Aug"
             9="Sep"
             10="Oct"
             11="Nov"
             12="Dec";

invalue MName "Jan"=1
             "Feb"=2
             "Mar"=3
             "Apr"=4
             "May"=5
             "Jun"=6
             "Jul"=7
             "Aug"=8
             "Sep"=9
             "Oct"=10
             "Nov"=11
             "Dec"=12;
run;

data want2;
set have;
by account;

output;

if last.account then do;
    /*Current month as a number*/
    month_n = input(month,MName.);
    /*LastMonth as a number*/
    to_month = input(lastMonth,MName.);

    do i=month_n+1 to to_month;
        month = put(i,MName.);
        output;
    end;
end;

drop month_n to_month i;
run;

您需要检查您是否在帐户的最后一行(要求数据按帐户排序)。然后将字符串月份转换为数字,在它们之间迭代,并输出新的月份名称

编辑

根据评论,此数据步骤将处理您的数据。保留旧答案以获取更多信息:

data want;
set have;
by account;

output;

if last.account then do;
    /*Current month as a number*/
    month_n = month(MonthDate);
    /*LastMonth as a number*/
    to_month = month(LastMonthDate);

    do i=month_n+1 to to_month;
        month = put(mdy(i,1,2000),monname3.); /*Increment the month and write the month name*/
        output;
    end;
end;

drop month_n to_month i;
run;
结束编辑

不幸的是,SAS没有一个简单的格式或信息在月份之间转换为字符串和数字。因此,在这里,我使用
month()
函数生成日期并提取月号:

data want;
set have;
by account;

output;

if last.account then do;
    /*Current month as a number*/
    month_n = month(input(catt("01",strip(month),"2000"),date9.));
    /*LastMonth as a number*/
    to_month = month(input(catt("01",lastMonth,"2000"),date9.));

    do i=month_n+1 to to_month;
        month = put(mdy(i,1,2000),monname3.); /*Increment the month and write the month name*/
        output;
    end;
end;

drop month_n to_month i;
run;
您可以为转换创建自己的格式和信息。这将使代码更简洁

proc format;
value MName 1="Jan"
             2="Feb"
             3="Mar"
             4="Apr"
             5="May"
             6="Jun"
             7="Jul"
             8="Aug"
             9="Sep"
             10="Oct"
             11="Nov"
             12="Dec";

invalue MName "Jan"=1
             "Feb"=2
             "Mar"=3
             "Apr"=4
             "May"=5
             "Jun"=6
             "Jul"=7
             "Aug"=8
             "Sep"=9
             "Oct"=10
             "Nov"=11
             "Dec"=12;
run;

data want2;
set have;
by account;

output;

if last.account then do;
    /*Current month as a number*/
    month_n = input(month,MName.);
    /*LastMonth as a number*/
    to_month = input(lastMonth,MName.);

    do i=month_n+1 to to_month;
        month = put(i,MName.);
        output;
    end;
end;

drop month_n to_month i;
run;

这里有一种使用DOW循环方法的方法。需要对数据进行“预传递”,以评估和列举涵盖每个账户日期范围的单调月份

关键概念是

  • 使用LAG和INTCK查找组内月间隔
  • 使用INTNX计算循环变量
  • 维护状态时,支持变量随后被删除
示例代码

假定
month
lastmount
是正确的日期变量

data have;
attrib account format=8. month format=yymon7. informat=date9. lastmonth format=yymon7. informat=date9.;
input
Account  Month        Balance  LastMonth; datalines;
1        01-Jan-18    5        01-May-18
1        01-Feb-18    2        01-May-18
1        01-Mar-18    1        01-May-18
2        01-Feb-18    6        01-Apr-18
2        01-Mar-18    4        01-Apr-18
3        01-Jan-18    15       01-May-18
3        01-Mar-18    11       01-May-18
run;

data want;
  do _n_ = 1 by 1 until (last.account);

    set have;
    by account;

    prior_month = lag(month);
    prior_balance = lag(balance);

    * fill-in gaps within group;
    if _n_ > 1 and intck('month', prior_month, month) > 1 then do;
      curr_month = month;
      curr_balance = balance;

      balance = prior_balance;

      gap_start = intnx('month', prior_month, 1);
      gap_end   = intnx('month', curr_month, -1);

      * repeat prior observed months data for missing months;
      do month = gap_start by 0 until (month >= gap_end);
        OUTPUT;
        put 'NOTE: ' account= 'within-group gap data output ' month= balance=;

        month = intnx('month', month, 1);
      end;

      * restore original state;
      month = curr_month;
      balance = curr_balance;
    end;

    * unconditional output for within group data;
    OUTPUT;
  end;

  gap_start = intnx('month', month, 1);
  gap_end   = intnx('month', lastmonth, 0); * just for saftey sake;

  * conditional output for post-group months using data from last row in group ;
  do month = gap_start by 0 until (month > gap_end);
    OUTPUT;
    put 'NOTE: ' account= '  post-group gap data output ' month= balance=;    
    month = intnx('month', month, 1);
  end;

  drop prior_: curr_: gap_:;
run;

这里有一种使用DOW循环方法的方法。需要对数据进行“预传递”,以评估和列举涵盖每个账户日期范围的单调月份

关键概念是

  • 使用LAG和INTCK查找组内月间隔
  • 使用INTNX计算循环变量
  • 维护状态时,支持变量随后被删除
示例代码

假定
month
lastmount
是正确的日期变量

data have;
attrib account format=8. month format=yymon7. informat=date9. lastmonth format=yymon7. informat=date9.;
input
Account  Month        Balance  LastMonth; datalines;
1        01-Jan-18    5        01-May-18
1        01-Feb-18    2        01-May-18
1        01-Mar-18    1        01-May-18
2        01-Feb-18    6        01-Apr-18
2        01-Mar-18    4        01-Apr-18
3        01-Jan-18    15       01-May-18
3        01-Mar-18    11       01-May-18
run;

data want;
  do _n_ = 1 by 1 until (last.account);

    set have;
    by account;

    prior_month = lag(month);
    prior_balance = lag(balance);

    * fill-in gaps within group;
    if _n_ > 1 and intck('month', prior_month, month) > 1 then do;
      curr_month = month;
      curr_balance = balance;

      balance = prior_balance;

      gap_start = intnx('month', prior_month, 1);
      gap_end   = intnx('month', curr_month, -1);

      * repeat prior observed months data for missing months;
      do month = gap_start by 0 until (month >= gap_end);
        OUTPUT;
        put 'NOTE: ' account= 'within-group gap data output ' month= balance=;

        month = intnx('month', month, 1);
      end;

      * restore original state;
      month = curr_month;
      balance = curr_balance;
    end;

    * unconditional output for within group data;
    OUTPUT;
  end;

  gap_start = intnx('month', month, 1);
  gap_end   = intnx('month', lastmonth, 0); * just for saftey sake;

  * conditional output for post-group months using data from last row in group ;
  do month = gap_start by 0 until (month > gap_end);
    OUTPUT;
    put 'NOTE: ' account= '  post-group gap data output ' month= balance=;    
    month = intnx('month', month, 1);
  end;

  drop prior_: curr_: gap_:;
run;

我还有一个字段,其中“Month”和“lastmount”都以实际日期格式(yymmddn10)存储-使用这些变量时,使用更干净的代码会更容易吗?我现在就编辑它们,这样你就可以看到了。当然只要更改
month\u n=month(输入(catt(“01”),strip(month),“2000”),date9.)
月=月(月日期变量)(如果它们是SAS日期)或
月=月(输入(月-日期变量,yymmdd10.)如果是字符串。我现在编辑问题以显示日期格式的变量。我还有一个字段,其中“月”和“上个月”都以实际日期格式存储(yymmddn10.)-使用这些变量时,代码是否更清晰?我现在就编辑它们,这样你就可以看到了。当然只要更改
month\u n=month(输入(catt(“01”),strip(month),“2000”),date9.)
月=月(月日期变量)(如果它们是SAS日期)或
月=月(输入(月-日期变量,yymmdd10.)如果是字符串。我现在编辑了这个问题,以显示日期格式的变量,以及month_date和lastMonth_date字符或日期变量?@DomPazz-它们是SAS日期格式(yymmdd10)。它们存储为日期还是字符串?格式是告诉SAS如何显示数据的一种方式。@DomPazz-请参见编辑栏属性的屏幕截图:)为您更新了我的答案。month_date和lastMonth_date是字符还是日期变量?@DomPazz-它们是SAS日期格式(yymmdd10)。它们存储为日期还是字符串?格式是告诉SAS如何显示数据的一种方式。@DomPazz-请参阅编辑以获取列属性的屏幕截图:)为您更新了我的答案。