累计股份总额的Stata发行:最大值1未确认
我会提前说,为了保密,我不能在这里报告数据,我修改了报告收入的数字,但也许有人可以帮助我发现以下代码中的任何错误,或者告诉我Stata发生了什么 我正在使用累计股份总额的Stata发行:最大值1未确认,stata,cumulative-sum,Stata,Cumulative Sum,我会提前说,为了保密,我不能在这里报告数据,我修改了报告收入的数字,但也许有人可以帮助我发现以下代码中的任何错误,或者告诉我Stata发生了什么 我正在使用sum()函数和generate命令来获取属于特定集团的公司的年收入累计总和,作为该集团年总收入的一部分 * 1) Derive the shares sort Group year rev_Tot by Group year: egen rev_Tot_group = total(rev_Tot) replace re
sum()
函数和generate
命令来获取属于特定集团的公司的年收入累计总和,作为该集团年总收入的一部分
* 1) Derive the shares
sort Group year rev_Tot
by Group year: egen rev_Tot_group = total(rev_Tot)
replace rev_Tot_group = rev_Tot / rev_Tot_group
* 2) Compute the cumulative sum of the shares by group per year
by Group year: gen Roll_sum_rev_Tot_group = sum(rev_Tot_group)
drop rev_Tot_group
显然,到目前为止,一切都很好。然后,我开始在组/年内按五分位数对观察结果进行分类——我只关心三分之二的组
gen quintile = ""
replace quintile = "Group1 0.2" if Roll_sum_rev_Tot_group <= 0.2 & Group == "1"
replace quintile = "Group2 0.2" if Roll_sum_rev_Tot_group <= 0.2 & Group == "2"
replace quintile = "Group1 0.4" if Roll_sum_rev_Tot_group > 0.2 & Roll_sum_rev_Tot_group <= 0.4 & Group == "1"
replace quintile = "Group2 0.4" if Roll_sum_rev_Tot_group > 0.2 & Roll_sum_rev_Tot_group <= 0.4 & Group == "2"
replace quintile = "Group1 0.6" if Roll_sum_rev_Tot_group > 0.4 & Roll_sum_rev_Tot_group <= 0.6 & Group == "1"
replace quintile = "Group2 0.6" if Roll_sum_rev_Tot_group > 0.4 & Roll_sum_rev_Tot_group <= 0.6 & Group == "2"
replace quintile = "Group1 0.8" if Roll_sum_rev_Tot_group > 0.6 & Roll_sum_rev_Tot_group <= 0.8 & Group == "1"
replace quintile = "Group2 0.8" if Roll_sum_rev_Tot_group > 0.6 & Roll_sum_rev_Tot_group <= 0.8 & Group == "2"
replace quintile = "Group1 1" if Roll_sum_rev_Tot_group > 0.8 & Roll_sum_rev_Tot_group <= 1 & Group == "1"
replace quintile = "Group2 1" if Roll_sum_rev_Tot_group > 0.8 & Roll_sum_rev_Tot_group <= 1 & Group == "2"
replace quintile = "Whatever" if Group == "3"
报告1
因此,我浏览以查看有关该观察的数据
br if quintile == ""
br
我看到的是,如果累计和为1,分位数
为空:
Group year rev_Tot Roll_sum_rev_Tot_group quintile
2 2018 37200 .993623 Group2 1
2 2018 37300 .995001 Group2 1
2 2018 43800 .996619 Group2 1
2 2018 45000 .998288 Group2 1
2 2018 46000 1
2 2019 0 0 Group2 0.2
2 2019 0 0 Group2 0.2
2 2019 0 0 Group2 0.2
我认为问题在于Stata从累积和变量中读取“1”的方式,假设
assert(Roll_sum_rev_Tot_group == 1) if quintile == ""
屈服
assertion is false
r(9);
当
不会产生任何错误
但是,如果我在读取模式下单击单元格,我会看到它是一个整洁的1。如果通过制表进行检查,则再次得到1:
tab Roll_sum_rev_Tot_group if quintile == "",m
产生
Roll_sum_re |
v_Tot_group | Freq. Percent Cum.
------------+---------------------------
1 | 1 100.00 100.00
------------+---------------------------
Total | 1 100.00
这让我很困惑。有人能帮我理解发生了什么吗?这对我来说当然是一个小问题,因为我可以继续分析,但我需要手动更正这一点,这有点令人担忧
谢谢。主要问题是精度。您知道逻辑上最后一个值应该是1,但Stata不知道 问题始于
sort Group year rev_Tot
by Group year: egen rev_Tot_group = total(rev_Tot)
replace rev_Tot_group = rev_Tot / rev_Tot_group
我会重写为
bysort Group year (rev_Tot) : gen double rev_Tot_group = sum(rev_Tot)
by Group year : replace rev_Tot_group = rev_Tot_group / rev_Tot_group[_N]
因为存在或应该保证每个观测数据块中的最后一个值正好为1
否则,如果微小的差异困扰着你,那么首先尝试使用double
作为存储类型
第二个问题是显示格式。您需要更改显示格式,以便更好地了解Stata所持有的内容。在极端情况下,格式%21x
可能会发光
您的五分位数分配代码似乎相当吃力,对于介于0和1之间的输入
,可能更倾向于以
gen wanted = ceil(5 * input)
如果需要,另请参见
bysort Group year (rev_Tot) : gen double rev_Tot_group = sum(rev_Tot)
by Group year : replace rev_Tot_group = rev_Tot_group / rev_Tot_group[_N]
gen wanted = ceil(5 * input)