Stata 将字符串数据和相应数据拆分为新行
我有一些数据,其中包含一个字符串变量(Stata 将字符串数据和相应数据拆分为新行,stata,Stata,我有一些数据,其中包含一个字符串变量(美国各州),一个相应的整数变量(注册)和另一个字符串 不幸的是,US states变量下的某些单元格列出了多个状态,并用分号分隔。我想把它们分成不同的行,然后在这些州之间平均分配相应的注册人数 例如,我有: State Enrollment Severity CA 100 Low MA;PA 50 Medium WA;OR;ID 120 H
美国各州
),一个相应的整数变量(注册
)和另一个字符串
不幸的是,US states
变量下的某些单元格列出了多个状态,并用分号分隔。我想把它们分成不同的行,然后在这些州之间平均分配相应的注册人数
例如,我有:
State Enrollment Severity
CA 100 Low
MA;PA 50 Medium
WA;OR;ID 120 High
我希望能够将其转化为:
State Enrollment Severity
CA 100 Low
MA 25 Medium
PA 25 Medium
WA 40 High
OR 40 High
ID 40 High
我曾尝试使用split
命令分离它们,然后(以一种复杂的方式,计算相应的注册数),但我不太确定如何让它们进入新行,即使使用restrape
编辑: 我还希望解决方案能够处理重复状态 例如:
State Enrollment Severity
CA 100 Low
MA;CA 50 Medium
WA;CA;ID 120 High
转化为:
State Enrollment Severity
CA 100 Low
MA 25 Medium
CA 25 Medium
WA 40 High
CA 40 High
ID 40 High
这里有一种方法可以使用原始数据完成您想要的操作:
clear
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;PA" 50 "Medium"
"WA;OR;ID" 120 "High"
end
generate id = _n
split State, p(;)
drop State
reshape long State, i(State?)
drop State?
keep if State != ""
bysort State (id): egen maxval = max(id)
bysort State (id): generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment
sort id
drop id _j maxval
order State Enrollment Severity
list, abbreviate(20)
+-------------------------------+
| State Enrollment Severity |
|-------------------------------|
1. | CA 100 Low |
2. | MA 25 Medium |
3. | PA 25 Medium |
4. | OR 40 High |
5. | ID 40 High |
6. | WA 40 High |
+-------------------------------+
clear
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;CA" 50 "Medium"
"WA;CA;ID" 120 "High"
end
generate id = _n
split State, p(;)
drop State
reshape long State, i(id)
keep if State != ""
bysort id: egen maxval = count(id)
bysort id: generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment
sort id
drop id _j maxval
order State Enrollment Severity
list, abbreviate(20)
+-------------------------------+
| State Enrollment Severity |
|-------------------------------|
1. | CA 100 Low |
2. | MA 25 Medium |
3. | CA 25 Medium |
4. | WA 40 High |
5. | CA 40 High |
6. | ID 40 High |
+-------------------------------+
编辑:
这里有一种方法可以使用您的修订的数据来完成您想要的操作:
clear
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;PA" 50 "Medium"
"WA;OR;ID" 120 "High"
end
generate id = _n
split State, p(;)
drop State
reshape long State, i(State?)
drop State?
keep if State != ""
bysort State (id): egen maxval = max(id)
bysort State (id): generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment
sort id
drop id _j maxval
order State Enrollment Severity
list, abbreviate(20)
+-------------------------------+
| State Enrollment Severity |
|-------------------------------|
1. | CA 100 Low |
2. | MA 25 Medium |
3. | PA 25 Medium |
4. | OR 40 High |
5. | ID 40 High |
6. | WA 40 High |
+-------------------------------+
clear
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;CA" 50 "Medium"
"WA;CA;ID" 120 "High"
end
generate id = _n
split State, p(;)
drop State
reshape long State, i(id)
keep if State != ""
bysort id: egen maxval = count(id)
bysort id: generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment
sort id
drop id _j maxval
order State Enrollment Severity
list, abbreviate(20)
+-------------------------------+
| State Enrollment Severity |
|-------------------------------|
1. | CA 100 Low |
2. | MA 25 Medium |
3. | CA 25 Medium |
4. | WA 40 High |
5. | CA 40 High |
6. | ID 40 High |
+-------------------------------+
这里有一种方法可以使用原始数据完成您想要的操作:
clear
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;PA" 50 "Medium"
"WA;OR;ID" 120 "High"
end
generate id = _n
split State, p(;)
drop State
reshape long State, i(State?)
drop State?
keep if State != ""
bysort State (id): egen maxval = max(id)
bysort State (id): generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment
sort id
drop id _j maxval
order State Enrollment Severity
list, abbreviate(20)
+-------------------------------+
| State Enrollment Severity |
|-------------------------------|
1. | CA 100 Low |
2. | MA 25 Medium |
3. | PA 25 Medium |
4. | OR 40 High |
5. | ID 40 High |
6. | WA 40 High |
+-------------------------------+
clear
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;CA" 50 "Medium"
"WA;CA;ID" 120 "High"
end
generate id = _n
split State, p(;)
drop State
reshape long State, i(id)
keep if State != ""
bysort id: egen maxval = count(id)
bysort id: generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment
sort id
drop id _j maxval
order State Enrollment Severity
list, abbreviate(20)
+-------------------------------+
| State Enrollment Severity |
|-------------------------------|
1. | CA 100 Low |
2. | MA 25 Medium |
3. | CA 25 Medium |
4. | WA 40 High |
5. | CA 40 High |
6. | ID 40 High |
+-------------------------------+
编辑:
这里有一种方法可以使用您的修订的数据来完成您想要的操作:
clear
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;PA" 50 "Medium"
"WA;OR;ID" 120 "High"
end
generate id = _n
split State, p(;)
drop State
reshape long State, i(State?)
drop State?
keep if State != ""
bysort State (id): egen maxval = max(id)
bysort State (id): generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment
sort id
drop id _j maxval
order State Enrollment Severity
list, abbreviate(20)
+-------------------------------+
| State Enrollment Severity |
|-------------------------------|
1. | CA 100 Low |
2. | MA 25 Medium |
3. | PA 25 Medium |
4. | OR 40 High |
5. | ID 40 High |
6. | WA 40 High |
+-------------------------------+
clear
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;CA" 50 "Medium"
"WA;CA;ID" 120 "High"
end
generate id = _n
split State, p(;)
drop State
reshape long State, i(id)
keep if State != ""
bysort id: egen maxval = count(id)
bysort id: generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment
sort id
drop id _j maxval
order State Enrollment Severity
list, abbreviate(20)
+-------------------------------+
| State Enrollment Severity |
|-------------------------------|
1. | CA 100 Low |
2. | MA 25 Medium |
3. | CA 25 Medium |
4. | WA 40 High |
5. | CA 40 High |
6. | ID 40 High |
+-------------------------------+
嗨,珀莉·斯宾塞,谢谢你的回复。但这确实不起作用。在我的原始问题中,我没有提到在我的datatset中状态也会重复(即CA或任何其他状态都有多个实例)。因此,当我重新整形时,我得到一个错误,在当前宽格式变量State1 State2中。。。不要唯一地标识数据。有解决办法吗?谢谢很明显,它适用于您给出的示例。我们用得到的东西工作——我们不是读心术的人。您需要提供一个如何重复
CA
的示例。它是独立于另一个观察中还是作为一个群体,即CA;MA
等。?另外,在这种情况下,如何计算注册人数?嗨,你说得对-我的意思是,它对我的情况不起作用,我说我不会早些提到它,并为此道歉。我已经对问题进行了修改。很抱歉,这是来回的。谢谢嗨,珀莉·斯宾塞,谢谢你的回复。但这确实不起作用。在我的原始问题中,我没有提到在我的datatset中状态也会重复(即CA或任何其他状态都有多个实例)。因此,当我重新整形时,我得到一个错误,在当前宽格式变量State1 State2中。。。不要唯一地标识数据。有解决办法吗?谢谢很明显,它适用于您给出的示例。我们用得到的东西工作——我们不是读心术的人。您需要提供一个如何重复CA
的示例。它是独立于另一个观察中还是作为一个群体,即CA;MA
等。?另外,在这种情况下,如何计算注册人数?嗨,你说得对-我的意思是,它对我的情况不起作用,我说我不会早些提到它,并为此道歉。我已经对问题进行了修改。很抱歉,这是来回的。谢谢