Stata 如何根据日期更改字符串的顺序
我收到的数据带有一个字符串变量,看起来像:Stata 如何根据日期更改字符串的顺序,stata,Stata,我收到的数据带有一个字符串变量,看起来像: var_name 25-DEC-99: A11, B14, C89; 28-FEB-94: A27, B94, C30 01-APR-11: A25, B82, C65 04-JUL-09: A21, B55, C26; 12-MAR-03: A11, B72, C68; 08-JUN-11: A62, B47, C82 12-JUN-00: A77, B19, C73; 03-JUL-12: A99, B04, C54 27-OCT-15: A22,
var_name
25-DEC-99: A11, B14, C89; 28-FEB-94: A27, B94, C30
01-APR-11: A25, B82, C65
04-JUL-09: A21, B55, C26; 12-MAR-03: A11, B72, C68; 08-JUN-11: A62, B47, C82
12-JUN-00: A77, B19, C73; 03-JUL-12: A99, B04, C54
27-OCT-15: A22, B95, C08
等等。我的目标是将这些字符串分成不同的变量名。变量名称将是v1\u日期
,v1\u A
,v1\u B
,v2\u日期
,v2\u A
,v2\u B
,v3\u日期
,v3\u A
,v3\u B
我可以使用
splitvar\u name,p(“;”
,重命名为v1
,v2
,和v3
,然后再次执行split
。但问题是,我希望v1
、v2
和v3
根据日期按时间顺序排列,而数据目前没有按这种方式排列。如何使v1
的日期在v2
之前,而v2
的日期在v3
之前?例如,在第一次观察中,我希望代码<25-DE-99:A11、B14、C89 < /代码>与<代码> V2 和<代码> 28FEB-94:A27、B94/C30相关,与<代码> V1 < /代码> .< /P> < P>一般,请考虑使用<代码> DATAEX < /代码>(SSC)来创建简单的数据示例。p>
您没有给出用于拆分变量的所有代码(并非琐碎的代码)。碰巧的是,我认为您的变量名不容易使用,所以我以自己的方式重新创建了拆分。如果重塑长的
分割的数据,那么按日期排序很容易,但我没有选择相反的重塑宽的
,因为我怀疑长的结构更容易处理
clear
input str80 data
"25-DEC-99: A11, B14, C89; 28-FEB-94: A27, B94, C30"
"01-APR-11: A25, B82, C65"
"04-JUL-09: A21, B55, C26; 12-MAR-03: A11, B72, C68; 08-JUN-11: A62, B47, C82"
"12-JUN-00: A77, B19, C73; 03-JUL-12: A99, B04, C54"
"27-OCT-15: A22, B95, C08"
end
split data, p(;) gen(x)
local j = 1
gen work = ""
foreach x of var x* {
replace work = substr(`x', 1, strpos(`x', ":") - 1)
gen date`j' = daily(work, "DMY", 2050)
replace work = substr(`x', strpos(`x', ":") + 1, .)
split work, p(,)
rename (work1 work2 work3) (vA`j' vB`j' vC`j')
local ++j
}
drop work
drop x*
drop data
gen id = _n
edit
reshape long date vA vB vC, i(id) j(which)
drop if missing(date)
bysort id (date): replace which = _n
list, sepby(id)
+----------------------------------------+
| id which date vA vB vC |
|----------------------------------------|
1. | 1 1 12477 A27 B94 C30 |
2. | 1 2 14603 A11 B14 C89 |
|----------------------------------------|
3. | 2 1 18718 A25 B82 C65 |
|----------------------------------------|
4. | 3 1 15776 A11 B72 C68 |
5. | 3 2 18082 A21 B55 C26 |
6. | 3 3 18786 A62 B47 C82 |
|----------------------------------------|
7. | 4 1 14773 A77 B19 C73 |
8. | 4 2 19177 A99 B04 C54 |
|----------------------------------------|
9. | 5 1 20388 A22 B95 C08 |
+----------------------------------------+
我相信以下几点会让你接近。它同时使用
拆分
和重塑
clear
set more off
input ///
str100 myvar
"25-DEC-99: A11, B14, C89; 28-FEB-94: A27, B94, C30"
"01-APR-11: A25, B82, C65"
"04-JUL-09: A21, B55, C26; 12-MAR-03: A11, B72, C68; 08-JUN-11: A62, B47, C82"
"12-JUN-00: A77, B19, C73; 03-JUL-12: A99, B04, C54"
"27-OCT-15: A22, B95, C08"
end
split myvar, p(;)
drop myvar
gen obs = _n
reshape long myvar, i(obs)
drop if missing(myvar)
split myvar, p(:)
drop myvar
gen myvar11 = date(myvar1, "DMY", 2020)
format %td myvar11
drop myvar1
rename (myvar11 myvar2) (mydate mycells)
order mydate, before(mycells)
bysort obs (mydate) : gen neworder = _n
drop _j
reshape wide mydate mycells, i(obs) j(neworder)
list
如果需要进一步拆分,可以循环使用
mycells
变量。这符合OP的要求,但我的预测是,数据结构将变得笨拙。@NickCox我同意。原始海报可以保留一个面板结构,使最后一个形状改变。