Stata 将字符串数据和相应数据拆分为新行

Stata 将字符串数据和相应数据拆分为新行,stata,Stata,我有一些数据,其中包含一个字符串变量(美国各州),一个相应的整数变量(注册)和另一个字符串 不幸的是,US states变量下的某些单元格列出了多个状态,并用分号分隔。我想把它们分成不同的行,然后在这些州之间平均分配相应的注册人数 例如,我有: State Enrollment Severity CA 100 Low MA;PA 50 Medium WA;OR;ID 120 H

我有一些数据,其中包含一个字符串变量(
美国各州
),一个相应的整数变量(
注册
)和另一个字符串

不幸的是,
US states
变量下的某些单元格列出了多个状态,并用分号分隔。我想把它们分成不同的行,然后在这些州之间平均分配相应的注册人数

例如,我有:

State       Enrollment   Severity
CA            100          Low
MA;PA         50           Medium
WA;OR;ID      120          High
我希望能够将其转化为:

State       Enrollment    Severity
CA             100          Low
MA             25           Medium
PA             25           Medium
WA             40           High
OR             40           High
ID             40           High
我曾尝试使用
split
命令分离它们,然后(以一种复杂的方式,计算相应的注册数),但我不太确定如何让它们进入新行,即使使用
restrape


编辑:

我还希望解决方案能够处理重复状态

例如:

State       Enrollment   Severity
CA            100          Low
MA;CA         50           Medium
WA;CA;ID      120          High
转化为:

State       Enrollment    Severity
CA             100          Low
MA             25           Medium
CA             25           Medium
WA             40           High
CA             40           High
ID             40           High

这里有一种方法可以使用原始数据完成您想要的操作:

clear 
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;PA" 50 "Medium"
"WA;OR;ID" 120 "High"
end

generate id = _n
split State, p(;)
drop State
reshape long State, i(State?)
drop State?

keep if State != ""
bysort State (id): egen maxval = max(id)
bysort State (id): generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment

sort id
drop id _j maxval
order State Enrollment Severity

list, abbreviate(20)

     +-------------------------------+
     | State   Enrollment   Severity |
     |-------------------------------|
  1. |    CA          100        Low |
  2. |    MA           25     Medium |
  3. |    PA           25     Medium |
  4. |    OR           40       High |
  5. |    ID           40       High |
  6. |    WA           40       High |
     +-------------------------------+
clear
input str10 State Enrollment str10 Severity
"CA"            100          "Low"
"MA;CA"         50           "Medium"
"WA;CA;ID"      120          "High"
end

generate id = _n
split State, p(;)
drop State

reshape long State, i(id)

keep if State != ""
bysort id: egen maxval = count(id)
bysort id: generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment

sort id
drop id _j maxval
order State Enrollment Severity

list, abbreviate(20)

     +-------------------------------+
     | State   Enrollment   Severity |
     |-------------------------------|
  1. |    CA          100        Low |
  2. |    MA           25     Medium |
  3. |    CA           25     Medium |
  4. |    WA           40       High |
  5. |    CA           40       High |
  6. |    ID           40       High |
     +-------------------------------+

编辑:


这里有一种方法可以使用您的修订的数据来完成您想要的操作:

clear 
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;PA" 50 "Medium"
"WA;OR;ID" 120 "High"
end

generate id = _n
split State, p(;)
drop State
reshape long State, i(State?)
drop State?

keep if State != ""
bysort State (id): egen maxval = max(id)
bysort State (id): generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment

sort id
drop id _j maxval
order State Enrollment Severity

list, abbreviate(20)

     +-------------------------------+
     | State   Enrollment   Severity |
     |-------------------------------|
  1. |    CA          100        Low |
  2. |    MA           25     Medium |
  3. |    PA           25     Medium |
  4. |    OR           40       High |
  5. |    ID           40       High |
  6. |    WA           40       High |
     +-------------------------------+
clear
input str10 State Enrollment str10 Severity
"CA"            100          "Low"
"MA;CA"         50           "Medium"
"WA;CA;ID"      120          "High"
end

generate id = _n
split State, p(;)
drop State

reshape long State, i(id)

keep if State != ""
bysort id: egen maxval = count(id)
bysort id: generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment

sort id
drop id _j maxval
order State Enrollment Severity

list, abbreviate(20)

     +-------------------------------+
     | State   Enrollment   Severity |
     |-------------------------------|
  1. |    CA          100        Low |
  2. |    MA           25     Medium |
  3. |    CA           25     Medium |
  4. |    WA           40       High |
  5. |    CA           40       High |
  6. |    ID           40       High |
     +-------------------------------+

这里有一种方法可以使用原始数据完成您想要的操作:

clear 
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;PA" 50 "Medium"
"WA;OR;ID" 120 "High"
end

generate id = _n
split State, p(;)
drop State
reshape long State, i(State?)
drop State?

keep if State != ""
bysort State (id): egen maxval = max(id)
bysort State (id): generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment

sort id
drop id _j maxval
order State Enrollment Severity

list, abbreviate(20)

     +-------------------------------+
     | State   Enrollment   Severity |
     |-------------------------------|
  1. |    CA          100        Low |
  2. |    MA           25     Medium |
  3. |    PA           25     Medium |
  4. |    OR           40       High |
  5. |    ID           40       High |
  6. |    WA           40       High |
     +-------------------------------+
clear
input str10 State Enrollment str10 Severity
"CA"            100          "Low"
"MA;CA"         50           "Medium"
"WA;CA;ID"      120          "High"
end

generate id = _n
split State, p(;)
drop State

reshape long State, i(id)

keep if State != ""
bysort id: egen maxval = count(id)
bysort id: generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment

sort id
drop id _j maxval
order State Enrollment Severity

list, abbreviate(20)

     +-------------------------------+
     | State   Enrollment   Severity |
     |-------------------------------|
  1. |    CA          100        Low |
  2. |    MA           25     Medium |
  3. |    CA           25     Medium |
  4. |    WA           40       High |
  5. |    CA           40       High |
  6. |    ID           40       High |
     +-------------------------------+

编辑:


这里有一种方法可以使用您的修订的数据来完成您想要的操作:

clear 
input str10 State Enrollment str10 Severity
"CA" 100 "Low"
"MA;PA" 50 "Medium"
"WA;OR;ID" 120 "High"
end

generate id = _n
split State, p(;)
drop State
reshape long State, i(State?)
drop State?

keep if State != ""
bysort State (id): egen maxval = max(id)
bysort State (id): generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment

sort id
drop id _j maxval
order State Enrollment Severity

list, abbreviate(20)

     +-------------------------------+
     | State   Enrollment   Severity |
     |-------------------------------|
  1. |    CA          100        Low |
  2. |    MA           25     Medium |
  3. |    PA           25     Medium |
  4. |    OR           40       High |
  5. |    ID           40       High |
  6. |    WA           40       High |
     +-------------------------------+
clear
input str10 State Enrollment str10 Severity
"CA"            100          "Low"
"MA;CA"         50           "Medium"
"WA;CA;ID"      120          "High"
end

generate id = _n
split State, p(;)
drop State

reshape long State, i(id)

keep if State != ""
bysort id: egen maxval = count(id)
bysort id: generate enrol = Enrollment / maxval
drop Enrollment
rename enrol Enrollment

sort id
drop id _j maxval
order State Enrollment Severity

list, abbreviate(20)

     +-------------------------------+
     | State   Enrollment   Severity |
     |-------------------------------|
  1. |    CA          100        Low |
  2. |    MA           25     Medium |
  3. |    CA           25     Medium |
  4. |    WA           40       High |
  5. |    CA           40       High |
  6. |    ID           40       High |
     +-------------------------------+

嗨,珀莉·斯宾塞,谢谢你的回复。但这确实不起作用。在我的原始问题中,我没有提到在我的datatset中状态也会重复(即CA或任何其他状态都有多个实例)。因此,当我重新整形时,我得到一个错误,在当前宽格式变量State1 State2中。。。不要唯一地标识数据。有解决办法吗?谢谢很明显,它适用于您给出的示例。我们用得到的东西工作——我们不是读心术的人。您需要提供一个如何重复
CA
的示例。它是独立于另一个观察中还是作为一个群体,即
CA;MA
等。?另外,在这种情况下,如何计算注册人数?嗨,你说得对-我的意思是,它对我的情况不起作用,我说我不会早些提到它,并为此道歉。我已经对问题进行了修改。很抱歉,这是来回的。谢谢嗨,珀莉·斯宾塞,谢谢你的回复。但这确实不起作用。在我的原始问题中,我没有提到在我的datatset中状态也会重复(即CA或任何其他状态都有多个实例)。因此,当我重新整形时,我得到一个错误,在当前宽格式变量State1 State2中。。。不要唯一地标识数据。有解决办法吗?谢谢很明显,它适用于您给出的示例。我们用得到的东西工作——我们不是读心术的人。您需要提供一个如何重复
CA
的示例。它是独立于另一个观察中还是作为一个群体,即
CA;MA
等。?另外,在这种情况下,如何计算注册人数?嗨,你说得对-我的意思是,它对我的情况不起作用,我说我不会早些提到它,并为此道歉。我已经对问题进行了修改。很抱歉,这是来回的。谢谢