Stata 创建三向百分比表_Stata_Percentage

Stata 创建三向百分比表

stata

Stata 创建三向百分比表,stata,percentage,Stata,Percentage,我希望有一个三向表，使用三个分类变量显示列或行百分比。下面的命令给出了计数，但我找不到如何获得百分比 sysuse nlsw88 table married race collgrad, col -------------------------------------------------------------------- | college graduate and race |

我希望有一个三向表，使用三个分类变量显示列或行百分比。下面的命令给出了计数，但我找不到如何获得百分比

sysuse nlsw88

table married race collgrad, col

--------------------------------------------------------------------
          |                college graduate and race                
          | ---- not college grad ----    ------ college grad ------
  married | white  black  other  Total    white  black  other  Total
----------+---------------------------------------------------------
   single |   355    256      5    616      132     53      3    188
  married |   862    224     12  1,098      288     50      6    344
--------------------------------------------------------------------

我怎样才能得到百分比呢？

这个答案将展示各种各样的技巧。缺点是我不知道一个简单的方法来得到你想要的。好处是，所有这些技巧都很容易理解，而且通常很有用

让我们用你的例子，这是非常好的目的

. sysuse nlsw88, clear 
(NLSW, 1988 extract)

提示#1您可以自己计算一个百分比变量。我关注单一百分比。在这个数据集中，

已婚

是二进制的，所以我不会显示互补百分比。一旦您计算了它，您就可以（a）依赖于这样一个事实，即它在您用来定义它的组中是恒定的（b）直接将它制成表格。我发现用户低估了

tabdisp

。它被称为程序员的命令，但使用起来并不困难

tabdisp

允许您动态设置显示格式；使用

格式

直接指定一个命令可能会对其他命令有所帮助

. egen pcsingle = mean(100 * (1 - married)), by(collgrad race)

. tabdisp collgrad race, c(pcsingle) format(%2.1f)

--------------------------------------
                 |        race        
college graduate | white  black  other
-----------------+--------------------
not college grad |  29.2   53.3   29.4
    college grad |  31.4   51.5   33.3
--------------------------------------

. format pcsingle %2.1f

提示#2用户编写的命令
组
提供了不同的灵活性。

组

可以从SSC安装（严格来说，必须先安装才能使用）。它是各种表的包装器，但使用

list

作为显示引擎

. * do this installation just once 
. ssc inst groups 

. groups collgrad race pcsingle 

  +-------------------------------------------------------+
  |         collgrad    race   pcsingle   Freq.   Percent |
  |-------------------------------------------------------|
  | not college grad   white       29.2    1217     54.19 |
  | not college grad   black       53.3     480     21.37 |
  | not college grad   other       29.4      17      0.76 |
  |     college grad   white       31.4     420     18.70 |
  |     college grad   black       51.5     103      4.59 |
  |-------------------------------------------------------|
  |     college grad   other       33.3       9      0.40 |
  +-------------------------------------------------------+

我们可以改进这一点。我们可以使用特征设置更好的标题文本。（在实践中，它们的约束可能比变量名小，但通常需要比变量标签短。）我们可以通过调用标准的

list

选项来使用分隔符

. char pcsingle[varname] "% single"

. char collgrad[varname] "college?"

. groups collgrad race pcsingle , subvarname sepby(collgrad) 

  +-------------------------------------------------------+
  |         college?    race   % single   Freq.   Percent |
  |-------------------------------------------------------|
  | not college grad   white       29.2    1217     54.19 |
  | not college grad   black       53.3     480     21.37 |
  | not college grad   other       29.4      17      0.76 |
  |-------------------------------------------------------|
  |     college grad   white       31.4     420     18.70 |
  |     college grad   black       51.5     103      4.59 |
  |     college grad   other       33.3       9      0.40 |
  +-------------------------------------------------------+

提示#通过使字符串等效，将3种显示格式转换为变量。我没有充分说明这一点，但当我想在

tabdisp

中将计数显示与小数位数的数字结果相结合时，我经常使用它<代码>格式（%2.1f）和

格式（%3.2f）

可能适用于大多数变量（顺便说一句，重要的细节是小数位数），但它们会导致将42的计数显示为42.0或42.00，这看起来非常愚蠢。

tabdisp

的

format（）

选项未插入字符串并更改内容；它甚至不知道字符串变量包含什么或它来自哪里。因此，字符串在出现时由

tabdisp

显示，这就是您想要的

. gen s_pcsingle = string(pcsingle, "%2.1f") 

. char s_pcsingle[varname] "% single"

groups

有一个选项，可以将列表中的内容保存为新数据集

提示#4要获得一个总类别，请临时将数据加倍。原始的克隆将重新标记为总类别。你可能需要做一些额外的计算，但没有什么比火箭科学更重要：一个聪明的高中生可以算出。这里有一个逐行研究的具体例子，而不是冗长的解释

. preserve 

. local Np1 = _N + 1 

. expand 2 
(2,246 observations created)

. replace race = 4 in `Np1'/L 
(2,246 real changes made)

. label def racelbl 4 "Total", modify  

. drop pcsingle 

. egen pcsingle = mean(100 * (1 - married)), by(collgrad race)

. char pcsingle[varname] "% single"

. format pcsingle %2.1f 

. gen istotal = race == 4 

. bysort collgrad istotal: gen total = _N 

. * for percents of the global total, we need to correct for doubling up     
. scalar alltotal = _N/2 

. * the table shows percents for college & race | collgrad and for collgrad | total 
. bysort collgrad race : gen pc = 100 * cond(istotal, total/alltotal, _N/total)  
. format pc %2.1f
. char pc[varname] "Percent" 

. groups collgrad race pcsingle pc , show(f) subvarname sepby(collgrad istotal) 

  +-------------------------------------------------------+
  |         college?    race   % single   Percent   Freq. |
  |-------------------------------------------------------|
  | not college grad   white       29.2      71.0    1217 |
  | not college grad   black       53.3      28.0     480 |
  | not college grad   other       29.4       1.0      17 |
  |-------------------------------------------------------|
  | not college grad   Total       35.9      76.3    1714 |
  |-------------------------------------------------------|
  |     college grad   white       31.4      78.9     420 |
  |     college grad   black       51.5      19.4     103 |
  |     college grad   other       33.3       1.7       9 |
  |-------------------------------------------------------|
  |     college grad   Total       35.3      23.7     532 |
  +-------------------------------------------------------+

请注意使用未显式显示的变量添加分隔线的额外技巧

已婚人士的百分比<代码>比赛<代码>collgrad<代码>每个人？或者其中的2个、3个或4个？@nick百分比给定

collgrad

。白人、单身、非大学毕业生的百分比大约为

355*100/（355+862）

。单身非大学毕业生的百分比（不分种族）为

616*100/（606+1098）

。类似于bys collgrad：table已婚种族，col的结果，但在一个表格中，而不是像bys给出的两个表格中，效果很好，肯定不涉及火箭科学。将其封装在单个函数调用中会很好。有卡方统计数据会更好。种族和婚姻之间的联系当你说功能时，你指的是命令。但你的评论强调了关键的困难。每个人都有一个他们认为相当简单、直接的表来生成，但是有数千种这样的表类型。这里的渐近线是生成任何类型表的通用命令。语法是一页又一页，文档是一整本手册。或者您可以自己编写一个程序，用一种语法创建所需的表。每个程序就是这样开始的，这是一个专业和热心程序员的论坛！