Stata 重塑后携带字符串变量的字符串标签_Stata_Reshape

Stata 重塑后携带字符串变量的字符串标签

stata

Stata 重塑后携带字符串变量的字符串标签,stata,reshape,Stata,Reshape,我在Stata有这样的数据集 entityID indicator indicatordescr indicatorvalue 1 gdp Gross Domestic 100 1 pop Population 15 1 area Area 50 2 gdp Gross Domesti

我在Stata有这样的数据集

entityID    indicator    indicatordescr    indicatorvalue
1           gdp          Gross Domestic    100
1           pop          Population        15
1           area         Area              50
2           gdp          Gross Domestic    200
2           pop          Population        10
2           area         Area              300

而

指示器

的值与

指示器的值之间存在一对一的映射关系
我想将其改宽，即：
entityID    gdp     pop     area
1           100     15      50
2           200     10      300

其中，我希望gdp
变量标签为“国内生产总值”，pop
标签为“人口”和地区
地区
不幸的是，据我所知，无法将indicatordescr
的值指定为indicator
的值标签，因此重塑无法将这些值标签转换为变量标签
我已经看过了：
这是：
但我不明白如何将这些应用到我的案例中
注意：重塑后的变量标记必须通过编程完成，因为指示器
和指示器rdesc
有许多值。
这里的“字符串标签”是非正式的；Stata不支持字符串变量的值标签。然而，这里需要的是字符串变量的不同值在重塑时成为变量标签
存在各种变通办法。这里有一个：将信息放入变量名中，然后再次取出
clear 
input entityID  str4 indicator   str14 indicatordescr    indicatorvalue
1           gdp          "Gross Domestic"    100
1           pop          "Population"        15
1           area         "Area"              50
2           gdp          "Gross Domestic"    200
2           pop          "Population"        10
2           area         "Area"              300
end 

gen what = indicator + "_"  + subinstr(indicatordescr, " ", "_", .)  
keep entityID what indicatorvalue 
reshape wide indicatorvalue , i(entityID) j(what) string 

foreach v of var indicator* {
    local V : subinstr local v "_" " ", all
    local new : word 1 of `V' 
    rename `v' `new'
    local V = substr("`V'", strpos("`V'", " ") + 1, .)
    label var `new' "`V'"
}

renpfix indicatorvalue 

编辑如果变量名的长度不匹配，请尝试其他解决方法：
clear 
input entityID  str4 indicator   str14 indicatordescr    indicatorvalue
1           gdp          "Gross Domestic"    100
1           pop          "Population"        15
1           area         "Area"              50
2           gdp          "Gross Domestic"    200
2           pop          "Population"        10
2           area         "Area"              300
end 

mata : sdata = uniqrows(st_sdata(., "indicator indicatordescr")) 
keep entityID indicator indicatorvalue 
reshape wide indicatorvalue , i(entityID) j(indicator) string 
renpfix indicatorvalue 
mata : for(i = 1; i <= rows(sdata); i++) stata("label var " + sdata[i, 1] + "  " + char(34) + sdata[i,2] + char(34))
end 

清除
输入entityID str4指示器str14指示器SCR指示器值
1国内生产总值“国内生产总值”100
1流行音乐“人口”15
1区域“区域”50
2本地生产总值“本地生产总值”200
2流行音乐“人口”10
2区域“区域”300
结束
mata:sdata=uniqrows（st_sdata（，“指示器指示装置”））
保持entityID指示灯为正值
重塑宽指示符值，i（entityID）j（指示符）字符串
renpfix指示剂值
玛塔：为了（i=1；谢谢你。但是，值得注意的是，这样做将防止携带长标签。上述算法要求使用变量名携带变量+标签信息。并且只有当变量名不超过32个字符时，重塑才会成功。是否有一种方法使用局部宏来克服此问题缺点？快速问题：一旦算法运行，是否有必要从内存中清除mata或sdata？如果有，如何清除？我无法从这里看到您的数据集，但sdata
故意将名称和变量标签表设置得尽可能小（uniqrows（）
应确保这一点）在我的数据集的这个例子中，sdata
将是相当小的，但可能不是每个人都是这样。我确实认为，一旦潜在的内存占用不再必要，就清除它们是一种很好的做法。我如何从内存中清除sdata
和mata
？您有多少个变量ave？即使是1000也不意味着不平凡的存储需求。但是，当然，公平点，有帮助的片刻告诉您关于mata clear
和mata drop
。不管怎样，mata的代码总是存在的。