Stata 计算并发订阅数

Stata 计算并发订阅数,stata,panel-data,Stata,Panel Data,我有一个数据库,其中有许多人(可能)同时订阅了一个服务,并在订阅生命周期内为每个事件提供了事务数据。我试图创建一个变量,该变量统计用户在给定事务时间的当前活动订阅数 举个例子,我的数据以如下形式存在: person | subscription | obs_date | sub_start_date | sub_end_date | num_concurrent_subs --------------------------------------------------------------

我有一个数据库,其中有许多人(可能)同时订阅了一个服务,并在订阅生命周期内为每个事件提供了事务数据。我试图创建一个变量,该变量统计用户在给定事务时间的当前活动订阅数

举个例子,我的数据以如下形式存在:

person | subscription | obs_date | sub_start_date | sub_end_date | num_concurrent_subs
--------------------------------------------------------------------------------------
1      | 1            | 09/01/10 | 09/01/10       | 09/01/11     | 1
1      | 1            | 10/01/10 | 09/01/10       | 09/01/11     | 2
1      | 1            | 11/01/10 | 09/01/10       | 09/01/11     | 2
1      | 2            | 10/01/10 | 10/01/10       | 09/01/11     | 2
1      | 2            | 11/01/10 | 10/01/10       | 09/01/11     | 2
1      | 3            | 11/01/14 | 09/01/14       | .            | 1
1      | 3            | 11/01/16 | 09/01/14       | .            | 1
1      | 4            | 11/01/15 | 10/01/15       | 11/01/15     | 3
1      | 5            | 11/01/15 | 10/01/15       | 11/01/15     | 3
对每个人来说都是如此等等。我想如上所述生成
num\u concurrent\u subs

也就是说,对于每个人,查看每个观察,并找出它属于
sub\u start\u date
sub\u end\u date
范围的订阅数


我读过一些关于Stata的
count
函数的内容,相信我已经接近一个解决方案了,但我不知道如何跨不同的订阅检查它。

您可以通过将订阅信息与事务数据分离,并将订阅数据转换为长格式,一个观察值表示开始日期,另一个表示结束日期。然后通过一个日期变量重新组合事务数据和订单。使用
onoff
变量跟踪每个订阅的开始和结束。比如:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(person subscription) str8(obs_date sub_start_date sub_end_date) byte num_concurrent_subs
1 1 "09/01/10" "09/01/10" "09/01/11" 1
1 1 "10/01/10" "09/01/10" "09/01/11" 2
1 1 "11/01/10" "09/01/10" "09/01/11" 2
1 2 "10/01/10" "10/01/10" "09/01/11" 2
1 2 "11/01/10" "10/01/10" "09/01/11" 2
1 3 "11/01/14" "09/01/14" "."        1
1 3 "11/01/16" "09/01/14" "."        1
1 4 "11/01/15" "10/01/15" "11/01/15" 3
1 5 "11/01/15" "10/01/15" "11/01/15" 3
end

* should always have an observation identifier
gen obsid = _n

* convert string to Stata numeric dates
gen odate = daily(obs_date,"MD20Y")
gen substart = daily(sub_start_date,"MD20Y")
gen subend = daily(sub_end_date,"MD20Y")
format %td odate substart subend
save "main_data.dta", replace

* reduce to subscription info with one obs for the start and one obs
* for the end of each subscription. use an onoff variable to tract
* start and end events
keep person subscription substart subend
bysort person subscription substart subend: keep if _n == 1
expand 2
bysort person subscription: gen adate = cond(_n == 1, substart, subend)
by person subscription: gen onoff = cond(_n == 1, 1, -1)
replace onoff = 0 if mi(adate)
format %td adate

append using "main_data.dta"

* include obs date in adate and nothing happens on the observation date
replace adate = odate if !mi(obsid)
replace onoff = 0 if !mi(obsid)

* order by person adate, put on event first, then obs events, then off events
gsort person adate -onoff
by person: gen concur = sum(onoff)

* return to original obs
keep if !mi(obsid)
sort obsid

您可以将订阅信息与事务数据分离,并将订阅数据转换为长格式,其中一个观察值表示开始日期,另一个表示结束日期。然后通过一个日期变量重新组合事务数据和订单。使用
onoff
变量跟踪每个订阅的开始和结束。比如:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(person subscription) str8(obs_date sub_start_date sub_end_date) byte num_concurrent_subs
1 1 "09/01/10" "09/01/10" "09/01/11" 1
1 1 "10/01/10" "09/01/10" "09/01/11" 2
1 1 "11/01/10" "09/01/10" "09/01/11" 2
1 2 "10/01/10" "10/01/10" "09/01/11" 2
1 2 "11/01/10" "10/01/10" "09/01/11" 2
1 3 "11/01/14" "09/01/14" "."        1
1 3 "11/01/16" "09/01/14" "."        1
1 4 "11/01/15" "10/01/15" "11/01/15" 3
1 5 "11/01/15" "10/01/15" "11/01/15" 3
end

* should always have an observation identifier
gen obsid = _n

* convert string to Stata numeric dates
gen odate = daily(obs_date,"MD20Y")
gen substart = daily(sub_start_date,"MD20Y")
gen subend = daily(sub_end_date,"MD20Y")
format %td odate substart subend
save "main_data.dta", replace

* reduce to subscription info with one obs for the start and one obs
* for the end of each subscription. use an onoff variable to tract
* start and end events
keep person subscription substart subend
bysort person subscription substart subend: keep if _n == 1
expand 2
bysort person subscription: gen adate = cond(_n == 1, substart, subend)
by person subscription: gen onoff = cond(_n == 1, 1, -1)
replace onoff = 0 if mi(adate)
format %td adate

append using "main_data.dta"

* include obs date in adate and nothing happens on the observation date
replace adate = odate if !mi(obsid)
replace onoff = 0 if !mi(obsid)

* order by person adate, put on event first, then obs events, then off events
gsort person adate -onoff
by person: gen concur = sum(onoff)

* return to original obs
keep if !mi(obsid)
sort obsid

下面是使用
rangejoin
(来自SSC)执行此操作的另一种方法。要安装它,请在Stata的命令窗口中键入:

ssc install rangejoin
使用
rangejoin
,您可以将每个订阅与订阅开始和结束日期内的所有事务数据配对。然后,它只是一个计数的问题,每个事务观察,它与多少订阅配对

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(person subscription) str8(obs_date sub_start_date sub_end_date) byte num_concurrent_subs
1 1 "09/01/10" "09/01/10" "09/01/11" 1
1 1 "10/01/10" "09/01/10" "09/01/11" 2
1 1 "11/01/10" "09/01/10" "09/01/11" 2
1 2 "10/01/10" "10/01/10" "09/01/11" 2
1 2 "11/01/10" "10/01/10" "09/01/11" 2
1 3 "11/01/14" "09/01/14" "."        1
1 3 "11/01/16" "09/01/14" "."        1
1 4 "11/01/15" "10/01/15" "11/01/15" 3
1 5 "11/01/15" "10/01/15" "11/01/15" 3
end

* should always have an observation identifier
gen obsid = _n

* convert string to Stata numeric dates
gen odate = daily(obs_date,"MD20Y")
gen substart = daily(sub_start_date,"MD20Y")
gen subend = daily(sub_end_date,"MD20Y")
format %td odate substart subend
save "main_data.dta", replace

* reduce to subscription start and end date per person
bysort person subscription substart subend: keep if _n == 1
keep person substart subend

* missing values will exclude obs so use a date in the future
replace subend = mdy(1,1,2099) if mi(subend)

* pair each subscription with an obs date
rangejoin odate substart subend using "main_data.dta", by(person)

* the number of current subcription is the number of pairings
bysort obsid: gen current = _N

* return to original obs
by obsid: keep if _n == 1
sort obsid
drop substart subend
rename (substart_U subend_U) (substart subend)

下面是使用
rangejoin
(来自SSC)执行此操作的另一种方法。要安装它,请在Stata的命令窗口中键入:

ssc install rangejoin
使用
rangejoin
,您可以将每个订阅与订阅开始和结束日期内的所有事务数据配对。然后,它只是一个计数的问题,每个事务观察,它与多少订阅配对

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(person subscription) str8(obs_date sub_start_date sub_end_date) byte num_concurrent_subs
1 1 "09/01/10" "09/01/10" "09/01/11" 1
1 1 "10/01/10" "09/01/10" "09/01/11" 2
1 1 "11/01/10" "09/01/10" "09/01/11" 2
1 2 "10/01/10" "10/01/10" "09/01/11" 2
1 2 "11/01/10" "10/01/10" "09/01/11" 2
1 3 "11/01/14" "09/01/14" "."        1
1 3 "11/01/16" "09/01/14" "."        1
1 4 "11/01/15" "10/01/15" "11/01/15" 3
1 5 "11/01/15" "10/01/15" "11/01/15" 3
end

* should always have an observation identifier
gen obsid = _n

* convert string to Stata numeric dates
gen odate = daily(obs_date,"MD20Y")
gen substart = daily(sub_start_date,"MD20Y")
gen subend = daily(sub_end_date,"MD20Y")
format %td odate substart subend
save "main_data.dta", replace

* reduce to subscription start and end date per person
bysort person subscription substart subend: keep if _n == 1
keep person substart subend

* missing values will exclude obs so use a date in the future
replace subend = mdy(1,1,2099) if mi(subend)

* pair each subscription with an obs date
rangejoin odate substart subend using "main_data.dta", by(person)

* the number of current subcription is the number of pairings
bysort obsid: gen current = _N

* return to original obs
by obsid: keep if _n == 1
sort obsid
drop substart subend
rename (substart_U subend_U) (substart subend)

严格地说,
count
是一个命令,而不是一个函数。在Stata中,命令和函数是不同种类的野兽。严格地说,
count
是命令,而不是函数。在Stata中,命令和函数是不同种类的野兽。
expand 2
技巧的讨论