Loops Stata：为样本中的每个观察结果生成单独的对照组（年龄括号）_Loops_Reference_Comparison_Stata

Loops Stata：为样本中的每个观察结果生成单独的对照组（年龄括号）

loops reference stata

Loops Stata：为样本中的每个观察结果生成单独的对照组（年龄括号）,loops,reference,comparison,stata,Loops,Reference,Comparison,Stata,目前，我尝试将比较组的某些属性（即：平均收入）分配给我微观数据样本中的每个个体。比较组由其他观察对象（性别、地区）定义，并由其他个体生成。到目前为止，我编码： egen com_group = group(gender region) bysort com_group: egen com_income = mean(income) 到目前为止，这是可行的，但这种方式引发了两个问题：由于平均数是为某一组中的所有个人计算的，并且当前观察是她自己的组的一部分，因此其自身的收入将

目前，我尝试将比较组的某些属性（即：平均收入）分配给我微观数据样本中的每个个体。比较组由其他观察对象（性别、地区）定义，并由其他个体生成。到目前为止，我编码：

     egen com_group = group(gender region)
     bysort com_group: egen com_income = mean(income)

到目前为止，这是可行的，但这种方式引发了两个问题：

由于平均数是为某一组中的所有个人计算的，并且当前观察是她自己的组的一部分，因此其自身的收入将用于计算自身参考组的平均收入。这可能会引起（一点）偏见。与问题2相比，这个问题似乎很小

我更愿意分配不太固定的群体的平均收入。更具体地说，我正在考虑生成类型组（性别区域年龄+-5岁）的比较组。因此，这个运行年龄段不能用上述方法解决，因为不同年龄段的每个观测都有不同的年龄段。此信息以前不能保存在“ref_group”这样的一个变量中。我的想法是循环所有观察结果并生成特定于观察结果的参考组。但是，我真的不知道怎么做

这会给你你想要的吗？我还没有检查细节。稍后我会补充一些解释。在此示例中，

age

的范围为+/-1，分组变量为

race

clear all
set more off

*----- example data -----

input ///
    idcode   age   race       wage      
        45    35      1   10.18518   
        47    35      1   3.526568     
        48    35      1   5.852843     
         1    37      2   11.73913     
         2    37      2   6.400963     
         9    37      1   10.49114     
        36    37      1   4.180602     
         7    39      1    4.62963     
        15    39      1   16.79548     
        20    39      1   9.661837     
        12    40      1   17.20612     
        13    40      1   13.08374     
        14    40      1   7.745568     
        16    40      1   15.48309     
        18    40      1   5.233495     
        19    40      1   10.16103     
        97    40      2   19.92563    
        22    41      1   9.057972     
        24    41      1   11.09501     
        44    41      1   28.45666   
        98    41      2   4.098635    
         3    42      2   5.016723     
         6    42      1   8.083731     
        23    42      1    8.05153     
        25    42      1   9.581316     
        99    42      2   9.875124    
         4    43      1   9.033813     
        39    44      1   9.790657     
        46    44      1   3.051529     
end

sort age idcode
list, sepby(age)

*----- what you want -----

gen mwage = .
levelsof race, local(lrace)

forvalues i = 1/`=_N' {
    foreach j of local lrace {

        summarize wage if ///
            inrange(age, age[`i']-1, age[`i']+1) /// age condition
            & race == `j'                        /// race condition
            & _n != `i'                          /// self-exclude condition
            , meanonly

        replace mwage = r(mean) if race == `j' in `i'

    }
}

list, sepby(age)

编辑如果Stata处理数据库太慢，那么可以使用。以下是我的尝试（我才刚刚开始使用它）：

全部清除
激起更多
*-----示例数据-----
系统使用nlsw88
扩展2
*-----你想要什么-----
egen gro=group（种族行业）//分组变量
*获取组数
仅对gro进行总结
本地numgro=r（最大值）
*计算组的上限
forvalues i=1/`numgro'{
如果gro=='i'，则总结gro，仅指
本地countgro`countgro'`r（N）'
}
/*
排序组和括号变量。Stata so Mata结果中的排序
只能使用-getmata将其发回Stata-
*/
分类年龄
*将统计变量和括号变量带到Mata
putmata STVAR=工资BRVAR=年龄
马塔：
/*
从Stata获取组的上限。
不被认为是好的风格。见《Mata事项：宏》，古尔德（2008年）
*/
UPLIM=代币（本地（“countgro”））
UPLIM=runningsum（strtoreal（UPLIM））//组的上限
/*
例如，在以下观察范围内，每条线
显示下限和上限：
1-11 
12-23 
24-28 
29-29 
*/
ST=J（行（STVAR），1，.）
for（i=1；i#1至少是一个常见问题：请参阅以获取更全面（更深入）的信息讨论，看这看起来是正确的精神。OP也希望性别
相同，因此有额外的代码类似于种族。这里的额外参考是我纠正了一个引起错误计算的小细节。的确@NickCox，原始海报必须添加一点代码，以获得额外计算的结果分组变量。我认为这是发布时一直期望的研究工作，但没有完全显示出来。感谢您的回答和有用的评论！添加其他代码以区分其他子组很容易通过使用子循环。-只有我的算法的效率似乎不是很高，因为在N=14.000的样本中生成组平均值需要很长的时间。在这里很难避免在观察值上循环。原则上，每个人的答案可能是唯一的，因为你想要所有其他具有相同特征的人或（年龄）的平均值松散相似的特征。而且年龄子集重叠，因此答案不能单独因为这个原因而分开确定。循环很难避免，但您可以依靠Mata完成此任务。它比单独使用Stata快得多。我添加了一个示例。
clear all
set more off

*----- example data -----

sysuse nlsw88
expand 2

*----- what you want -----

egen gro = group(race industry) // grouping variables

* Get number of groups
summarize gro, meanonly
local numgro = r(max)

* Compute upper limits for groups
forvalues i = 1/`numgro' {
     summarize gro if gro == `i', meanonly
     local countgro `countgro' `r(N)'
}

/*
sort group and bracking var. sort in Stata so Mata results
can be posted back to Stata using only -getmata-
*/

sort gro age 

* Take statistic and bracking variables to Mata
putmata STVAR=wage BRVAR=age 

mata:

/*
Get upper limits of groups from Stata.
Not considered good style. See Mata Matters: Macros, Gould (2008)
*/

UPLIM = tokens(st_local("countgro")) 
UPLIM = runningsum(strtoreal(UPLIM)) // upper limits of groups

/*
For example, in the following observation ranges, each line 
shows lower and upper limits:

1-11 
12-23 
24-28 
29-29 
*/


ST = J(rows(STVAR), 1, .)
for (i = 1; i <= cols(UPLIM); i++) {

    if (i == 1) {
        ro = 1
    }
    else {
        ro = UPLIM[i-1]+1
    }

    co = UPLIM[i]

    STVARP = STVAR[|ro\co|]     // statistic variable
    BRVARP = BRVAR[|ro\co|]     // bracket variable

    STPART = J(rows(STVARP), 1, 0)
    for (j = 1; j <= rows(BRVARP); j++) {

            SMALLER = BRVARP :>= BRVARP[j] - 1
            LARGER = BRVARP :<= BRVARP[j] + 1

            STPART[j] = ( sum(STVARP :* SMALLER :* LARGER) - STVARP[j] ) / ( sum(SMALLER :* LARGER) - 1 ) //division by zero gives . for last group with only one observation

    }

    ST[|ro\co|] = STPART // stack results
}

end

getmata mwage=ST

keep wage race industry gro age mwage
sort gro age wage

//list wage gro age matawage, sepby(gro)