SAS中的状态持续时间

SAS中的状态持续时间,sas,duration,Sas,Duration,我有一个关于SAS和变量特定状态持续时间分析的问题。我想知道我的数据集中的每个人在状态a中持续停留多长时间,直到状态b出现。如果状态c发生在状态a之后,则持续时间应设置为零。请注意,如果pre_period处于状态a,我也会将持续时间设置为零,但如果我在之后获得另一个状态a,则应该将其计算在内 数据看起来有点像这样: pre_period week1 week2 week3 week4 week5 week6 week7 ... id1 b b a

我有一个关于SAS和变量特定状态持续时间分析的问题。我想知道我的数据集中的每个人在状态a中持续停留多长时间,直到状态b出现。如果状态c发生在状态a之后,则持续时间应设置为零。请注意,如果pre_period处于状态a,我也会将持续时间设置为零,但如果我在之后获得另一个状态a,则应该将其计算在内

数据看起来有点像这样:

    pre_period    week1 week2 week3 week4 week5 week6 week7 ...
id1 b             b     a     a     a     b     c     c     ...
id2 a             a     a     a     b     a     b     b     ...
id3 b             b     a     a     b     a     a     b     ...
id4 c             c     c     a     a     a     a     a     ...
id5 a             b     a     b     b     a     a     b     ...
id6 b             a     a     a     a     a     a     a     ...
    dur1 dur2 dur3 dur4 ...
id1 3    .    .    .    ...
id2 1    .    .    .    ...
id3 3    1    .    .    ...
id4 5    .    .    .    ...
id5 1    2    .    .    ...
id6 7    .    .    .    ...
sas代码中的示例集:

data work.sample_data;
input id $ pre_period $  (week1-week7) ($);
datalines;
id1 b b a a a b c c
id2 a a a a b a b b
id3 b b a a b a a b
id4 c c c a a a a a
id5 a b a b b a a b
id6 b a a a a a a a
;
所以对于id1,它应该给我一个3的持续时间,对于ID21,对于ID33和1,对于ID45,对于ID51和2,对于ID67

因此,输出应该有点像这样:

    pre_period    week1 week2 week3 week4 week5 week6 week7 ...
id1 b             b     a     a     a     b     c     c     ...
id2 a             a     a     a     b     a     b     b     ...
id3 b             b     a     a     b     a     a     b     ...
id4 c             c     c     a     a     a     a     a     ...
id5 a             b     a     b     b     a     a     b     ...
id6 b             a     a     a     a     a     a     a     ...
    dur1 dur2 dur3 dur4 ...
id1 3    .    .    .    ...
id2 1    .    .    .    ...
id3 3    1    .    .    ...
id4 5    .    .    .    ...
id5 1    2    .    .    ...
id6 7    .    .    .    ...
我是SAS的初学者,没有找到解决这个问题的方法。请注意,数据集包含数千行和大约1000列,因此对于一个人,我可能有几个状态a的间隔,我都希望捕获这些间隔(因此在输出中有几个持续时间变量)


我很感激你的建议。谢谢

在这些情况下,用有限状态机来思考是明智的。通过这种方式,如果您的需求发生变化,那么在以后扩展状态机是非常容易的

持续时间在三种情况下有效(包括结果集中给出的不合法情况):

  • 如果出现以下情况,则应计算状态
    a
    的连续持续时间
    • 它以状态
      b
      结束
    • 当数据集结束时,它仍处于状态
      a
    • 只要它不是在第一周开始的,当前期状态是
      a
首先,我们必须考虑前期需求,我们可以将此状态称为
前期锁定状态

    do week = 1 to last_week;
        if current_state = pre_period_locked_state then do;
            if 'a' not = pre_period or 'a' not = week_state then do;
            current_state = duration_state;
        end;
        if current_state = no_duration_state then do;
            if 'a' = week_state then do;
                 current_state = duration_state;
            end;
        end;
        if current_state = dispatch_state then do;
            if 'b' = week_state or 'a' = week_state and week = last_week then do;
                duration{duration_index} = duration_count;
                duration_index = duration_index + 1;
            end;
            duration_count = 0; 
            current_state = no_duration_state;
        end;
下一个问题是当状态不是
a
时,这里称为
no\u duration\u state

    do week = 1 to last_week;
        if current_state = pre_period_locked_state then do;
            if 'a' not = pre_period or 'a' not = week_state then do;
            current_state = duration_state;
        end;
        if current_state = no_duration_state then do;
            if 'a' = week_state then do;
                 current_state = duration_state;
            end;
        end;
        if current_state = dispatch_state then do;
            if 'b' = week_state or 'a' = week_state and week = last_week then do;
                duration{duration_index} = duration_count;
                duration_index = duration_index + 1;
            end;
            duration_count = 0; 
            current_state = no_duration_state;
        end;
这是我们的空闲状态,只有在新的持续时间开始时才会改变。下一个状态名为
duration\u state
,定义为:

        if current_state = duration_state then do;
            if 'a' = week_state then do;
                duration_count = duration_count + 1;
            end;
            if ('a' not = week_state or week = last_week) and 0 < duration_count then do;
                current_state = dispatch_state;
             end;
        end;
这将负责对输出表进行索引,并确保只存储有效的持续时间

我在下面添加了
id7
,因为样本数据没有任何以b以外的状态结束的持续时间

data work.sample_data;
input id $ pre_period $  (week1-week7) ($);
datalines;
id1 b b a a a b c c
id2 a a a a b a b b
id3 b b a a b a a b
id4 c c c a a a a a
id5 a b a b b a a b
id6 b a a a a a a a
id7 b a a c a a a a
;
完整sas代码状态机:

 data work.duration_fsm;
    set work.sample_data;
    array weeks{*} week1-week7;
    array duration{*} dur1-dur7;

    *states;
    initial_reset_state = 'initial_reset_state';
    pre_period_locked_state = 'pre_period_locked_state';
    duration_state = 'duration_state';
    no_duration_state = 'no_duration_state';
    dispatch_state = 'dispatch_state';
    length current_state $ 50;

    *initial values;
    current_state = initial_reset_state;
    last_week = dim(weeks);

    keep id dur1-dur7;

    do week = 1 to last_week;
        if current_state = initial_reset_state then do;
            duration_count = 0;
            duration_index = 1;
        current_state = pre_period_locked_state;
        end;
        week_state = weeks{week};
        if current_state = pre_period_locked_state then do;
            if 'a' not = pre_period and 'a' = week_state then do;
                    current_state = duration_state;
                end;
            else if 'a' = pre_period and 'a' not = week_state then do;
                current_state = no_duration_state;
            end;
        end;
        if current_state = no_duration_state then do;
            if 'a' = week_state then do;
                 current_state = duration_state;
            end;
        end;
        if current_state = duration_state then do;
            if 'a' = week_state then do;
                duration_count = duration_count + 1;
            end;
            if ('a' not = week_state or week = last_week) and 0 < duration_count then do;
                current_state = dispatch_state;
             end;
        end;
        if current_state = dispatch_state then do;
            if 'b' = week_state or  'a' = week_state and week = last_week then do;
                duration{duration_index} = duration_count;
                duration_index = duration_index + 1;
            end;
            duration_count = 0; 
            current_state = no_duration_state;
        end;
    end;
    run;

您期望的结果似乎与您的要求不一致。第6周id 1出现状态c-这不意味着id 1的持续时间为0吗?对于id 2,pre_period有状态a,那么id 2不也应该有持续时间0吗?也许我有点不清楚:如果pre_period是a,那么下面的a不应该被计算在内,但是如果个人离开状态,它应该像其他人一样被处理。如果c紧跟在a之后,则c的情况应该是相关的。谢谢你对我描述的提示!目前还不清楚,如果存在由b或c状态分隔的状态a的多次运行,那么您需要哪种类型的输出。您希望每个id有一行多列来保存每次运行的长度,还是多行?试着输入你希望输出数据集与输入数据集的格式相同的内容。我编辑了这篇文章,希望它现在更清晰。非常感谢。您应该花一些时间来创建将构建一些示例数据(数据步骤/数据线)的代码。喜欢这个问题吗