Stata 有效的分类方法

Stata 有效的分类方法,stata,Stata,我有一个sting变量,称之为desc,它接受许多不同的值,比如300。我想创建两个新变量,desc_a和desc_bdesc包含两类值;我想把属于第一类的放在desc_a中,其余的放在desc_b中。我将描述我想出的一种方法。然而,这种方法非常慢。我想知道是否有更好的方法来做到这一点 gen desc_a = "" gen desc_b = "" tab desc 结果选项卡输出可能会显示为(忽略不相关的信息): *手动浏览并将选项卡的每个字符串复制并粘贴到语句中,例如: replace d

我有一个sting变量,称之为
desc
,它接受许多不同的值,比如300。我想创建两个新变量,
desc_a
desc_b
desc
包含两类值;我想把属于第一类的放在
desc_a
中,其余的放在
desc_b
中。我将描述我想出的一种方法。然而,这种方法非常慢。我想知道是否有更好的方法来做到这一点

gen desc_a = ""
gen desc_b = ""
tab desc
结果选项卡输出可能会显示为(忽略不相关的信息):

*手动浏览并将选项卡的每个字符串复制并粘贴到语句中,例如:

replace desc_a = "First Element of a" if desc=="First Element of a"
replace desc_a = "Second Element of a" if desc=="Second Element of a"
replace desc_a = "Third Element of a" if desc=="Third Element of a"
...
replace desc_b = "First Element of b" if desc=="First Element of b"
replace desc_b = "Second Element of b" if desc=="Second Element of b"

请注意,实际的数据实际上并没有遵循这样一个好的模式,因此我无法通过正则表达式或类似的方法来自动处理它。我确实需要手动检查每一个,并决定它属于哪一类。然而,我确实认为,我所描述的涉及大量复制和粘贴的方法不是最好的方法。

这不是最好的方法,但它是对我上述解决方案的改进:

gen desc_a = ""
replace 
replace desc_a = desc if desc=="First Element of a"
replace desc_a = desc if desc=="Second Element of a"
replace desc_a = desc if desc=="Third Element of a"
...

replace desc_b = desc if desc_a==""

Stata数据编辑器窗口将有助于减少您的工作量

创建一个包含两个变量的Stata数据集:desc的300个不同值和一个变量,我称之为ab,初始化为missing。然后在Stata数据编辑器中打开数据集并向下搜索观察值,用描述属于a组还是b组(例如1或2)的指示符替换(通过在单元格中键入)缺少的值。然后保存该数据集并将其与原始数据集合并,然后使用合并值ab将描述分配给相应的变量

generate desc_a = desc if ab==1
generate desc_b = desc if ab==2

扩展@William的解决方案

* recreate your data example
clear
input str19 desc int n
"First Element of a" 53 
"Second Element of a" 22 
"First Element of b " 78 
"Third Element of a" 232 
"Second Element of b" 33 
end
expand n
set seed 314324
gen somedata = runiform()
sort somedata
tab des
tempfile main
save "`main'"

* reduce to one observation per value of desc
bysort desc: keep if _n == 1
keep desc

* make an effort to identify a or b, note that
* the following fails for one obs
gen ab = regexs(1) if regexm(desc,"(a|b)$")

* save and edit manually
tempfile toedit
save "`toedit'"

* this is simulated editing...
clear
input str19 desc str1 ab
"First Element of a" "a" 
"First Element of b " "b" 
"Second Element of a" "a" 
"Second Element of b" "b" 
"Third Element of a" "a" 
end

* now combine with the original data
merge 1:m desc using "`main'", assert(match) nogen
* recreate your data example
clear
input str19 desc int n
"First Element of a" 53 
"Second Element of a" 22 
"First Element of b " 78 
"Third Element of a" 232 
"Second Element of b" 33 
end
expand n
set seed 314324
gen somedata = runiform()
sort somedata
tab des
tempfile main
save "`main'"

* reduce to one observation per value of desc
bysort desc: keep if _n == 1
keep desc

* make an effort to identify a or b, note that
* the following fails for one obs
gen ab = regexs(1) if regexm(desc,"(a|b)$")

* save and edit manually
tempfile toedit
save "`toedit'"

* this is simulated editing...
clear
input str19 desc str1 ab
"First Element of a" "a" 
"First Element of b " "b" 
"Second Element of a" "a" 
"Second Element of b" "b" 
"Third Element of a" "a" 
end

* now combine with the original data
merge 1:m desc using "`main'", assert(match) nogen