Stata 有效的分类方法
我有一个sting变量,称之为Stata 有效的分类方法,stata,Stata,我有一个sting变量,称之为desc,它接受许多不同的值,比如300。我想创建两个新变量,desc_a和desc_bdesc包含两类值;我想把属于第一类的放在desc_a中,其余的放在desc_b中。我将描述我想出的一种方法。然而,这种方法非常慢。我想知道是否有更好的方法来做到这一点 gen desc_a = "" gen desc_b = "" tab desc 结果选项卡输出可能会显示为(忽略不相关的信息): *手动浏览并将选项卡的每个字符串复制并粘贴到语句中,例如: replace d
desc
,它接受许多不同的值,比如300。我想创建两个新变量,desc_a
和desc_b
desc
包含两类值;我想把属于第一类的放在desc_a
中,其余的放在desc_b
中。我将描述我想出的一种方法。然而,这种方法非常慢。我想知道是否有更好的方法来做到这一点
gen desc_a = ""
gen desc_b = ""
tab desc
结果选项卡输出可能会显示为(忽略不相关的信息):
*手动浏览并将选项卡的每个字符串复制并粘贴到语句中,例如:
replace desc_a = "First Element of a" if desc=="First Element of a"
replace desc_a = "Second Element of a" if desc=="Second Element of a"
replace desc_a = "Third Element of a" if desc=="Third Element of a"
...
replace desc_b = "First Element of b" if desc=="First Element of b"
replace desc_b = "Second Element of b" if desc=="Second Element of b"
请注意,实际的数据实际上并没有遵循这样一个好的模式,因此我无法通过正则表达式或类似的方法来自动处理它。我确实需要手动检查每一个,并决定它属于哪一类。然而,我确实认为,我所描述的涉及大量复制和粘贴的方法不是最好的方法。这不是最好的方法,但它是对我上述解决方案的改进:
gen desc_a = ""
replace
replace desc_a = desc if desc=="First Element of a"
replace desc_a = desc if desc=="Second Element of a"
replace desc_a = desc if desc=="Third Element of a"
...
replace desc_b = desc if desc_a==""
Stata数据编辑器窗口将有助于减少您的工作量 创建一个包含两个变量的Stata数据集:desc的300个不同值和一个变量,我称之为ab,初始化为missing。然后在Stata数据编辑器中打开数据集并向下搜索观察值,用描述属于a组还是b组(例如1或2)的指示符替换(通过在单元格中键入)缺少的值。然后保存该数据集并将其与原始数据集合并,然后使用合并值ab将描述分配给相应的变量
generate desc_a = desc if ab==1
generate desc_b = desc if ab==2
扩展@William的解决方案
* recreate your data example
clear
input str19 desc int n
"First Element of a" 53
"Second Element of a" 22
"First Element of b " 78
"Third Element of a" 232
"Second Element of b" 33
end
expand n
set seed 314324
gen somedata = runiform()
sort somedata
tab des
tempfile main
save "`main'"
* reduce to one observation per value of desc
bysort desc: keep if _n == 1
keep desc
* make an effort to identify a or b, note that
* the following fails for one obs
gen ab = regexs(1) if regexm(desc,"(a|b)$")
* save and edit manually
tempfile toedit
save "`toedit'"
* this is simulated editing...
clear
input str19 desc str1 ab
"First Element of a" "a"
"First Element of b " "b"
"Second Element of a" "a"
"Second Element of b" "b"
"Third Element of a" "a"
end
* now combine with the original data
merge 1:m desc using "`main'", assert(match) nogen
* recreate your data example
clear
input str19 desc int n
"First Element of a" 53
"Second Element of a" 22
"First Element of b " 78
"Third Element of a" 232
"Second Element of b" 33
end
expand n
set seed 314324
gen somedata = runiform()
sort somedata
tab des
tempfile main
save "`main'"
* reduce to one observation per value of desc
bysort desc: keep if _n == 1
keep desc
* make an effort to identify a or b, note that
* the following fails for one obs
gen ab = regexs(1) if regexm(desc,"(a|b)$")
* save and edit manually
tempfile toedit
save "`toedit'"
* this is simulated editing...
clear
input str19 desc str1 ab
"First Element of a" "a"
"First Element of b " "b"
"Second Element of a" "a"
"Second Element of b" "b"
"Third Element of a" "a"
end
* now combine with the original data
merge 1:m desc using "`main'", assert(match) nogen