Database 在SAS中创建从上次输入日期到指定日期的虚拟行
假设我有以下SAS数据集:Database 在SAS中创建从上次输入日期到指定日期的虚拟行,database,syntax,sas,Database,Syntax,Sas,假设我有以下SAS数据集: Account Month Balance LastMonth MonthDate LastMonthDate 1 Jan 5 May 2012-01-01 2012-05-01 1 Feb 2 May 2012-02-01 2012-05-01 1 Mar 1 May 2012-03-01 201
Account Month Balance LastMonth MonthDate LastMonthDate
1 Jan 5 May 2012-01-01 2012-05-01
1 Feb 2 May 2012-02-01 2012-05-01
1 Mar 1 May 2012-03-01 2012-05-01
2 Feb 6 Apr 2012-02-01 2012-04-01
2 Mar 4 Apr 2012-03-01 2012-04-01
我需要创建以下内容:
Account Month Balance LastMonth MonthDate LastMonthDate
1 Jan 5 May 2012-01-01 2012-05-01
1 Feb 2 May 2012-02-01 2012-05-01
1 Mar 1 May 2012-03-01 2012-05-01
1 Apr 1 May 2012-04-01 2012-05-01
1 May 1 May 2012-05-01 2012-05-01
2 Feb 6 Apr 2012-02-01 2012-04-01
2 Mar 4 Apr 2012-03-01 2012-04-01
2 Apr 4 Apr 2012-04-01 2012-04-01
也就是说,我需要为每个帐户添加额外的行,以便每个帐户在“LastMonth”列之前每个月都有一个条目。对于不在原始数据集中的月份,余额必须与数据集中最后一个条目的余额保持不变。我的数据集已按“帐户”和“月份”排序
请注意,这只是两个示例帐户,因为我的real dataset有多个帐户,每个帐户都有不同的“LastMonth”列。我需要对该过程进行概括,以便为每个帐户创建截至其“LastMonth”日期的缺失行数
编辑:“MonthDate”和“LastMonthDate”存储如下:
您需要检查您是否在帐户的最后一行(要求数据按帐户排序)。然后将字符串月份转换为数字,在它们之间迭代,并输出新的月份名称 编辑 根据评论,此数据步骤将处理您的数据。保留旧答案以获取更多信息:
data want;
set have;
by account;
output;
if last.account then do;
/*Current month as a number*/
month_n = month(MonthDate);
/*LastMonth as a number*/
to_month = month(LastMonthDate);
do i=month_n+1 to to_month;
month = put(mdy(i,1,2000),monname3.); /*Increment the month and write the month name*/
output;
end;
end;
drop month_n to_month i;
run;
结束编辑
不幸的是,SAS没有一个简单的格式或信息在月份之间转换为字符串和数字。因此,在这里,我使用month()
函数生成日期并提取月号:
data want;
set have;
by account;
output;
if last.account then do;
/*Current month as a number*/
month_n = month(input(catt("01",strip(month),"2000"),date9.));
/*LastMonth as a number*/
to_month = month(input(catt("01",lastMonth,"2000"),date9.));
do i=month_n+1 to to_month;
month = put(mdy(i,1,2000),monname3.); /*Increment the month and write the month name*/
output;
end;
end;
drop month_n to_month i;
run;
您可以为转换创建自己的格式和信息。这将使代码更简洁
proc format;
value MName 1="Jan"
2="Feb"
3="Mar"
4="Apr"
5="May"
6="Jun"
7="Jul"
8="Aug"
9="Sep"
10="Oct"
11="Nov"
12="Dec";
invalue MName "Jan"=1
"Feb"=2
"Mar"=3
"Apr"=4
"May"=5
"Jun"=6
"Jul"=7
"Aug"=8
"Sep"=9
"Oct"=10
"Nov"=11
"Dec"=12;
run;
data want2;
set have;
by account;
output;
if last.account then do;
/*Current month as a number*/
month_n = input(month,MName.);
/*LastMonth as a number*/
to_month = input(lastMonth,MName.);
do i=month_n+1 to to_month;
month = put(i,MName.);
output;
end;
end;
drop month_n to_month i;
run;
您需要检查您是否在帐户的最后一行(要求数据按帐户排序)。然后将字符串月份转换为数字,在它们之间迭代,并输出新的月份名称 编辑 根据评论,此数据步骤将处理您的数据。保留旧答案以获取更多信息:
data want;
set have;
by account;
output;
if last.account then do;
/*Current month as a number*/
month_n = month(MonthDate);
/*LastMonth as a number*/
to_month = month(LastMonthDate);
do i=month_n+1 to to_month;
month = put(mdy(i,1,2000),monname3.); /*Increment the month and write the month name*/
output;
end;
end;
drop month_n to_month i;
run;
结束编辑
不幸的是,SAS没有一个简单的格式或信息在月份之间转换为字符串和数字。因此,在这里,我使用month()
函数生成日期并提取月号:
data want;
set have;
by account;
output;
if last.account then do;
/*Current month as a number*/
month_n = month(input(catt("01",strip(month),"2000"),date9.));
/*LastMonth as a number*/
to_month = month(input(catt("01",lastMonth,"2000"),date9.));
do i=month_n+1 to to_month;
month = put(mdy(i,1,2000),monname3.); /*Increment the month and write the month name*/
output;
end;
end;
drop month_n to_month i;
run;
您可以为转换创建自己的格式和信息。这将使代码更简洁
proc format;
value MName 1="Jan"
2="Feb"
3="Mar"
4="Apr"
5="May"
6="Jun"
7="Jul"
8="Aug"
9="Sep"
10="Oct"
11="Nov"
12="Dec";
invalue MName "Jan"=1
"Feb"=2
"Mar"=3
"Apr"=4
"May"=5
"Jun"=6
"Jul"=7
"Aug"=8
"Sep"=9
"Oct"=10
"Nov"=11
"Dec"=12;
run;
data want2;
set have;
by account;
output;
if last.account then do;
/*Current month as a number*/
month_n = input(month,MName.);
/*LastMonth as a number*/
to_month = input(lastMonth,MName.);
do i=month_n+1 to to_month;
month = put(i,MName.);
output;
end;
end;
drop month_n to_month i;
run;
这里有一种使用DOW循环方法的方法。不需要对数据进行“预传递”,以评估和列举涵盖每个账户日期范围的单调月份 关键概念是
- 使用LAG和INTCK查找组内月间隔
- 使用INTNX计算循环变量
- 维护状态时,支持变量随后被删除
month
和lastmount
是正确的日期变量
data have;
attrib account format=8. month format=yymon7. informat=date9. lastmonth format=yymon7. informat=date9.;
input
Account Month Balance LastMonth; datalines;
1 01-Jan-18 5 01-May-18
1 01-Feb-18 2 01-May-18
1 01-Mar-18 1 01-May-18
2 01-Feb-18 6 01-Apr-18
2 01-Mar-18 4 01-Apr-18
3 01-Jan-18 15 01-May-18
3 01-Mar-18 11 01-May-18
run;
data want;
do _n_ = 1 by 1 until (last.account);
set have;
by account;
prior_month = lag(month);
prior_balance = lag(balance);
* fill-in gaps within group;
if _n_ > 1 and intck('month', prior_month, month) > 1 then do;
curr_month = month;
curr_balance = balance;
balance = prior_balance;
gap_start = intnx('month', prior_month, 1);
gap_end = intnx('month', curr_month, -1);
* repeat prior observed months data for missing months;
do month = gap_start by 0 until (month >= gap_end);
OUTPUT;
put 'NOTE: ' account= 'within-group gap data output ' month= balance=;
month = intnx('month', month, 1);
end;
* restore original state;
month = curr_month;
balance = curr_balance;
end;
* unconditional output for within group data;
OUTPUT;
end;
gap_start = intnx('month', month, 1);
gap_end = intnx('month', lastmonth, 0); * just for saftey sake;
* conditional output for post-group months using data from last row in group ;
do month = gap_start by 0 until (month > gap_end);
OUTPUT;
put 'NOTE: ' account= ' post-group gap data output ' month= balance=;
month = intnx('month', month, 1);
end;
drop prior_: curr_: gap_:;
run;
这里有一种使用DOW循环方法的方法。不需要对数据进行“预传递”,以评估和列举涵盖每个账户日期范围的单调月份 关键概念是
- 使用LAG和INTCK查找组内月间隔
- 使用INTNX计算循环变量
- 维护状态时,支持变量随后被删除
month
和lastmount
是正确的日期变量
data have;
attrib account format=8. month format=yymon7. informat=date9. lastmonth format=yymon7. informat=date9.;
input
Account Month Balance LastMonth; datalines;
1 01-Jan-18 5 01-May-18
1 01-Feb-18 2 01-May-18
1 01-Mar-18 1 01-May-18
2 01-Feb-18 6 01-Apr-18
2 01-Mar-18 4 01-Apr-18
3 01-Jan-18 15 01-May-18
3 01-Mar-18 11 01-May-18
run;
data want;
do _n_ = 1 by 1 until (last.account);
set have;
by account;
prior_month = lag(month);
prior_balance = lag(balance);
* fill-in gaps within group;
if _n_ > 1 and intck('month', prior_month, month) > 1 then do;
curr_month = month;
curr_balance = balance;
balance = prior_balance;
gap_start = intnx('month', prior_month, 1);
gap_end = intnx('month', curr_month, -1);
* repeat prior observed months data for missing months;
do month = gap_start by 0 until (month >= gap_end);
OUTPUT;
put 'NOTE: ' account= 'within-group gap data output ' month= balance=;
month = intnx('month', month, 1);
end;
* restore original state;
month = curr_month;
balance = curr_balance;
end;
* unconditional output for within group data;
OUTPUT;
end;
gap_start = intnx('month', month, 1);
gap_end = intnx('month', lastmonth, 0); * just for saftey sake;
* conditional output for post-group months using data from last row in group ;
do month = gap_start by 0 until (month > gap_end);
OUTPUT;
put 'NOTE: ' account= ' post-group gap data output ' month= balance=;
month = intnx('month', month, 1);
end;
drop prior_: curr_: gap_:;
run;
我还有一个字段,其中“Month”和“lastmount”都以实际日期格式(yymmddn10)存储-使用这些变量时,使用更干净的代码会更容易吗?我现在就编辑它们,这样你就可以看到了。当然只要更改
month\u n=month(输入(catt(“01”),strip(month),“2000”),date9.)代码>至月=月(月日期变量)代码>(如果它们是SAS日期)或月=月(输入(月-日期变量,yymmdd10.)代码>如果是字符串。我现在编辑问题以显示日期格式的变量。我还有一个字段,其中“月”和“上个月”都以实际日期格式存储(yymmddn10.)-使用这些变量时,代码是否更清晰?我现在就编辑它们,这样你就可以看到了。当然只要更改month\u n=month(输入(catt(“01”),strip(month),“2000”),date9.)代码>至月=月(月日期变量)代码>(如果它们是SAS日期)或月=月(输入(月-日期变量,yymmdd10.)代码>如果是字符串。我现在编辑了这个问题,以显示日期格式的变量,以及month_date和lastMonth_date字符或日期变量?@DomPazz-它们是SAS日期格式(yymmdd10)。它们存储为日期还是字符串?格式是告诉SAS如何显示数据的一种方式。@DomPazz-请参见编辑栏属性的屏幕截图:)为您更新了我的答案。month_date和lastMonth_date是字符还是日期变量?@DomPazz-它们是SAS日期格式(yymmdd10)。它们存储为日期还是字符串?格式是告诉SAS如何显示数据的一种方式。@DomPazz-请参阅编辑以获取列属性的屏幕截图:)为您更新了我的答案。