在Stata中拆分字符串变量,并按顺序放置值
在Stata中拆分字符串变量通常很容易。然而,在我的例子中,我很难重新组织这些值的顺序。变量表示与观察相关的特征列表,如下所示:在Stata中拆分字符串变量,并按顺序放置值,stata,Stata,在Stata中拆分字符串变量通常很容易。然而,在我的例子中,我很难重新组织这些值的顺序。变量表示与观察相关的特征列表,如下所示: Variable_Name No Phosphates No Perfumes; No Phosphates; Private Label No Perfumes; Private Label Private Label 如果我使用代码分割变量_Name,p(“;”),我得到 Variable_Name1 Variable_Name2 Varia
Variable_Name
No Phosphates
No Perfumes; No Phosphates; Private Label
No Perfumes; Private Label
Private Label
如果我使用代码分割变量_Name,p(“;”)
,我得到
Variable_Name1 Variable_Name2 Variable_Name2
No Phosphates
No Perfumes No Phosphates Private Label
No Perfumes Private Label
Private Label
如何重新排列这些值,使其看起来像这样
Variable_Name1 Variable_Name2 Variable_Name3
No Phosphates
No Phosphates No Perfumes Private Label
No Perfumes Private Label
Private Label
换句话说,如何在同一列下对相同的特征进行分组
以下是完整的代码:
clear
input str50 Variable_Name
"No Phosphates"
"No Perfumes; No Phosphates; Private Label"
"No Perfumes; Private Label"
"Private Label"
end
split Variable_Name, p("; ")
我面临的挑战是,我有许多未知的特征。我不可能手动识别它们并将它们排序到列中,也不可能查找某些字符串值。请参阅一些重塑
技术。请注意,这将对拼写等方面的细微差异完全敏感
clear
input str100 what
"No Phosphates"
"No Perfumes; No Phosphates; Private Label"
"No Perfumes; Private Label"
"Private Label"
end
split what, p(;)
rename what original
gen id = _n
reshape long what, i(id)
replace what = trim(what)
egen group = group(what)
drop if missing(group)
drop _j
reshape wide what, i(id) j(group)
list
请参阅以了解一些重塑技术。请注意,这将对拼写等方面的细微差异完全敏感
clear
input str100 what
"No Phosphates"
"No Perfumes; No Phosphates; Private Label"
"No Perfumes; Private Label"
"Private Label"
end
split what, p(;)
rename what original
gen id = _n
reshape long what, i(id)
replace what = trim(what)
egen group = group(what)
drop if missing(group)
drop _j
reshape wide what, i(id) j(group)
list