如何在Stata中找到最近的（第二最近的，第三最近的…）值_Stata_Matching_Closest

如何在Stata中找到最近的（第二最近的，第三最近的…）值

stata

如何在Stata中找到最近的（第二最近的，第三最近的…）值,stata,matching,closest,Stata,Matching,Closest,我在Stata15/ca.5000观测中面临以下问题，给出了以下数据： n Company_ID Revenue Industry_Class Fiscal_Year 1 100 5000 11 2018 2 200 4000 11 2018 3 300 3000 11 2017 4 400 2500 22

我在Stata15/ca.5000观测中面临以下问题，给出了以下数据：

n  Company_ID  Revenue Industry_Class  Fiscal_Year
1  100         5000    11              2018
2  200         4000    11              2018
3  300         3000    11              2017
4  400         2500    22              2018
5  500         3500    11              2018

我想创建新变量

Peer_1

和

Peer_2

。该守则应按如下方式运作：

对于

Peer\u 1

：如果

会计年度

和

行业类别

相同，请提供与相应行中收入最接近的公司的

公司ID

对于

Peer\u 2

：如果

会计年度

和

行业类别

相同，请给我与相应行收入第二接近的公司的

公司ID

如果代码在相同的

会计年度和行业类别中找不到对等方，请指定“不适用”


例如，公司ID
“100”的收入为5000英镑。同一类别和同一年份中最接近的公司是收入为4000美元的“200”公司。在同一类别和年份中排名第二的公司是收入为3500的“500”公司
输出应如下所示：
 n  Company_ID  Revenue Industry_Class  Fiscal_Year  Peer_1  Peer_2
 1  100         5000    11              2018         200     500
 2  200         4000    11              2018         500     100
 3  300         3000    11              2017         N/A     N/A
 4  400         2500    22              2018         N/A     N/A
 5  500         3500    11              2018         200     100

是否有可能在Stata中高效地编写此代码？我偶然发现了一些函数，如psmatch
或nnmatch
我不知道这些命令（在Stata中它们不是函数）psmatch
或nnmatch
。这是一个粗糙的算法，在公司之间循环，并在同一年和同一类别中找到在该公司的收入和收入之间具有最小绝对差异的观察值（在Stata中不称为行）
我没有假设n
对应一个公司标识符，即使在数据示例中是这样
Stata不支持将N/A
作为数字缺失的代码。您可以将其作为字符串值，但Stata没有赋予它特殊的含义
如果两个或两个以上的公司具有相同的这种最小的绝对差异，则不采取特殊行动，并且确定为对等1和对等2的公司将是任意的
clear 
input n  Company_ID  Revenue Industry_Class  Fiscal_Year  Peer_1  Peer_2
 1  100         5000    11              2018         200     500
 2  200         4000    11              2018         500     100
 3  300         3000    11              2017         .       .
 4  400         2500    22              2018         .       . 
 5  500         3500    11              2018         200     100
 end 

 gen wanted_1 = . 
 gen wanted_2 = . 
 gen diff = . 
 gen ref = . 
 egen id = group(Company_ID)
 su id, meanonly 

 * loop over firms 
 quietly forval i = 1/`r(max)' { 
    * look up its class (should be constant) 
    * summarize leaves r(min) and r(max) in its wake 
    * if result is constant r(min) and r(max) will be identical 
    su Industry_Class if id == `i', meanonly 
    if r(min) != r(max) { 
        di as err "id `i'" is in two or more classes" 
    }
    else { 
        local class = r(min)

        * this firm's revenue is the reference revenue, different in each year 
        replace ref = Revenue if id == `i'
        bysort Fiscal_Year (ref): replace ref = ref[1] 

        * care only about other firms, same class and same year 
        replace diff = cond(id != `i' & Industry_Class == `class', abs(Revenue - ref), .) 
        * sort on differences to get peers 
        bysort Fiscal_Year (diff) : replace wanted_1 = Company_ID[1] if id == `i' & diff[1] < . 
        by Fiscal_Year (diff) : replace wanted_2 = Company_ID[2] if id == `i' & diff[2] < .
    }
 } 

 drop id diff ref 

 sort Industry_Class Fiscal_Year 

 list, sepby(Industry_Class Fiscal_Year)

     +--------------------------------------------------------------------------------------+
     | n   Compan~D   Revenue   Indust~s   Fiscal~r   Peer_1   Peer_2   wanted_1   wanted_2 |
     |--------------------------------------------------------------------------------------|
  1. | 3        300      3000         11       2017        .        .          .          . |
     |--------------------------------------------------------------------------------------|
  2. | 2        200      4000         11       2018      500      100        500        100 |
  3. | 1        100      5000         11       2018      200      500        200        500 |
  4. | 5        500      3500         11       2018      200      100        200        100 |
     |--------------------------------------------------------------------------------------|
  5. | 4        400      2500         22       2018        .        .          .          . |
     +--------------------------------------------------------------------------------------+

清除
输入n公司ID收入行业类别会计年度同行1同行2
1  100         5000    11              2018         200     500
2  200         4000    11              2018         500     100
3  300         3000    11              2017         .       .
4  400         2500    22              2018         .       . 
5  500         3500    11              2018         200     100
结束
gen_1=。
gen_2=。
gen diff=。
gen ref=。
egen id=集团（公司id）
苏伊，我只是指
*绕过公司
静默forval i=1/`r（max）{
*查找其类（应为常量）
*总结尾迹中的叶r（最小值）和r（最大值）
*若结果为常数，r（最小值）和r（最大值）将相同
如果id=`i'，则表示仅限
如果r（min）！=r（max）{
di as err“id`i'”位于两个或多个类中
}
否则{
本地类=r（最小值）
*该公司的收入为参考收入，每年不同
如果id=='i'，则替换ref=Revenue
bysort会计年度（ref）：替换ref=ref[1]
*只关心同一类同一年的其他公司
替换diff=cond（id！='i'和行业等级=='Class'，abs（收入-参考），）
*对差异进行排序以获得同龄人
按分类会计年度（差异）：如果ID=`i'&差异[1]<，则替换所需的公司ID[1]。
按会计年度（差异）：如果ID=='i'和差异[2]<，则替换所需的公司ID[2]。
}
} 
下降id差异参考
分类行业\分类会计年度
列表，sepby（行业级财政年度）
+--------------------------------------------------------------------------------------+
|n公司D收入行业的财政~r同行\u 1同行\u 2通缉\u 1通缉\u 2|
|--------------------------------------------------------------------------------------|
1. | 3        300      3000         11       2017        .        .          .          . |
|--------------------------------------------------------------------------------------|
2. | 2        200      4000         11       2018      500      100        500        100 |
3. | 1        100      5000         11       2018      200      500        200        500 |
4. | 5        500      3500         11       2018      200      100        200        100 |
|--------------------------------------------------------------------------------------|
5. | 4        400      2500         22       2018        .        .          .          . |
+--------------------------------------------------------------------------------------+
我不知道命令（在Stata中它们不是函数）psmatch
或nnmatch
。这是一个粗糙的算法，在公司之间循环，并在同一年和同一类别中找到收益和收益之间绝对差异最小的观察值（在Stata中不称为行）
我没有假设n
对应一个公司标识符，即使在数据示例中是这样
Stata不支持将N/A
作为数字缺失的代码。您可以将其作为字符串值，但Stata没有赋予其特殊含义
如果两个或两个以上的公司具有相同的这种最小的绝对差异，则不采取特殊行动，并且确定为对等1和对等2的公司将是任意的
clear 
input n  Company_ID  Revenue Industry_Class  Fiscal_Year  Peer_1  Peer_2
 1  100         5000    11              2018         200     500
 2  200         4000    11              2018         500     100
 3  300         3000    11              2017         .       .
 4  400         2500    22              2018         .       . 
 5  500         3500    11              2018         200     100
 end 

 gen wanted_1 = . 
 gen wanted_2 = . 
 gen diff = . 
 gen ref = . 
 egen id = group(Company_ID)
 su id, meanonly 

 * loop over firms 
 quietly forval i = 1/`r(max)' { 
    * look up its class (should be constant) 
    * summarize leaves r(min) and r(max) in its wake 
    * if result is constant r(min) and r(max) will be identical 
    su Industry_Class if id == `i', meanonly 
    if r(min) != r(max) { 
        di as err "id `i'" is in two or more classes" 
    }
    else { 
        local class = r(min)

        * this firm's revenue is the reference revenue, different in each year 
        replace ref = Revenue if id == `i'
        bysort Fiscal_Year (ref): replace ref = ref[1] 

        * care only about other firms, same class and same year 
        replace diff = cond(id != `i' & Industry_Class == `class', abs(Revenue - ref), .) 
        * sort on differences to get peers 
        bysort Fiscal_Year (diff) : replace wanted_1 = Company_ID[1] if id == `i' & diff[1] < . 
        by Fiscal_Year (diff) : replace wanted_2 = Company_ID[2] if id == `i' & diff[2] < .
    }
 } 

 drop id diff ref 

 sort Industry_Class Fiscal_Year 

 list, sepby(Industry_Class Fiscal_Year)

     +--------------------------------------------------------------------------------------+
     | n   Compan~D   Revenue   Indust~s   Fiscal~r   Peer_1   Peer_2   wanted_1   wanted_2 |
     |--------------------------------------------------------------------------------------|
  1. | 3        300      3000         11       2017        .        .          .          . |
     |--------------------------------------------------------------------------------------|
  2. | 2        200      4000         11       2018      500      100        500        100 |
  3. | 1        100      5000         11       2018      200      500        200        500 |
  4. | 5        500      3500         11       2018      200      100        200        100 |
     |--------------------------------------------------------------------------------------|
  5. | 4        400      2500         22       2018        .        .          .          . |
     +--------------------------------------------------------------------------------------+

清除
输入n公司ID收入行业类别会计年度同行1同行2
1  100         5000    11              2018         200     500
2  200         4000    11              2018         500     100
3  300         3000    11              2017         .       .
4  400         2500    22              2018         .       . 
5  500         3500    11              2018         200     100
结束
gen_1=。
gen_2=。
消息