Stata 如何从现有数据集中总结有用信息并合并到新数据集中？_Stata

Stata 如何从现有数据集中总结有用信息并合并到新数据集中？

stata

Stata 如何从现有数据集中总结有用信息并合并到新数据集中？,stata,Stata,我试图从调查数据集中总结有用的信息。该数据集包含有关被调查个人父母的信息。一个ID将关联四行，其中包含有关其母亲、父亲、岳母和岳父的信息。然而，我只对被调查者感兴趣，而不是他们的父母 * Example generated by -dataex-. To install: ssc install dataex clear input str12 ID byte(parentID ca001) "010104101002" 1 2 "010104101002" 2 1 "010104101002"

我试图从调查数据集中总结有用的信息。该数据集包含有关被调查个人父母的信息。一个ID将关联四行，其中包含有关其母亲、父亲、岳母和岳父的信息。然而，我只对被调查者感兴趣，而不是他们的父母

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 ID byte(parentID ca001)
"010104101002" 1 2
"010104101002" 2 1
"010104101002" 3 1
"010104101002" 4 1
"010104102002" 1 2
"010104102002" 2 2
"010104102002" 3 2
"010104102002" 4 1
"010104103001" 1 2
"010104103001" 2 2
"010104103001" 3 2
"010104103001" 4 1
"010104104001" 1 2
"010104104001" 2 2
"010104104001" 3 2
"010104104001" 4 1
"010104105002" 1 2
"010104105002" 2 2
"010104105002" 3 2
"010104105002" 4 2
end
label values parentID parent
label def parent 1 "1 Father", modify
label def parent 2 "2 Mother", modify
label def parent 3 "3 Father-in-law", modify
label def parent 4 "4 Mother-in-law", modify
label values ca001 ca001
label def ca001 1 "1 Yes", modify
label def ca001 2 "2 No", modify

例如，

ca001

表示受访者的父母（母亲/父亲/婆婆/岳父）是否还活着。我需要的是一个伪变量，它指示仍然活着的ID的双亲的数量（0-4）

我需要去掉重复的ID，并为一次观察设置一个唯一的ID。这是因为我需要通过将唯一ID从一个数据集匹配到另一个数据集，将此数据集与其他数据集合并

这可能适合您：

bysort ID: egen alive_parents = total(-(ca001-2))
keep ID alive_parents
duplicates drop
list

     +-------------------------+
     |     ID    alive_parents |
     |-------------------------|
  1. | 010104101002          3 |
  2. | 010104102002          1 |
  3. | 010104103001          1 |
  4. | 010104104001          1 |
  5. | 010104105002          0 |
     +-------------------------+

想法是从ca001中减去2，使0==否和-1==是，然后取0==否和1==是的负数，然后按ID求和，得到活着的父母的总数

* Example generated by -dataex-. To install: ssc install dataex
clear
input str12 ID byte(parentID ca001)
"010104101002" 1 2
"010104101002" 2 1
"010104101002" 3 1
"010104101002" 4 1
"010104102002" 1 2
"010104102002" 2 2
"010104102002" 3 2
"010104102002" 4 1
"010104103001" 1 2
"010104103001" 2 2
"010104103001" 3 2
"010104103001" 4 1
"010104104001" 1 2
"010104104001" 2 2
"010104104001" 3 2
"010104104001" 4 1
"010104105002" 1 2
"010104105002" 2 2
"010104105002" 3 2
"010104105002" 4 2
end
label values parentID parent
label def parent 1 "1 Father", modify
label def parent 2 "2 Mother", modify
label def parent 3 "3 Father-in-law", modify
label def parent 4 "4 Mother-in-law", modify
label values ca001 ca001
label def ca001 1 "1 Yes", modify
label def ca001 2 "2 No", modify

然后我们删除额外的变量，剩下的ID-alive\u双亲对每个都有4个重复项，所以我们删除了重复项

使用

dataex

（在Stata中，与

ssc inst dataex一起安装）给出可读的数据示例。不幸的是，图像没有多大用处。哦，谢谢。。。我修改了我的问题，现在正确吗？看起来不太好…谢谢你改进了问题。非常感谢！！这正是我想要的！！请仔细注意：egen，sum（）
自Stata 9起未记录。这个名字被弃用，取而代之的是egen，total（）
，因为它与非egen
函数sum（）
相混淆，后者产生累积或运行的总和。我的名字似乎与egen，sum（）
没什么关系。。。刚刚运行了命令它将与egen，sum（）
一起工作，但是Nick说的是最好使用egen，total（）
，这样它就不会与Stata单独的sum（）
函数混淆，后者的工作方式有所不同