Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/.net/22.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R cut2分成不同的铲斗_R - Fatal编程技术网

R cut2分成不同的铲斗

R cut2分成不同的铲斗,r,R,我目前正在做一些数据处理,并一直在寻找一种方法,在每组中创建具有相同观察次数的十分位数。我遇到了Hmisc包和cut2函数,感觉它应该通过指定g=10将数据分成10个桶,每个桶中的观察值数量相等。然而,这个函数的输出已经有相当大的偏差。我是否错误地使用了cut2 我正在使用的代码: library(Hmisc) testdata <- data.frame(rating= c(8, 8, 8, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 4, 8,

我目前正在做一些数据处理,并一直在寻找一种方法,在每组中创建具有相同观察次数的十分位数。我遇到了Hmisc包和cut2函数,感觉它应该通过指定g=10将数据分成10个桶,每个桶中的观察值数量相等。然而,这个函数的输出已经有相当大的偏差。我是否错误地使用了cut2

我正在使用的代码:

library(Hmisc)
testdata <- data.frame(rating= c(8, 8,  8,  6,  8,  8,  8,  8,  8,  8,  8,  8,  8,  4,  8,  8,  8,  6,  8,  8,  8,  8,  6,  8,  6,  8,  4,  8,  8,  8,  6,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  4,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  6,  8,  8,  8,  8,  6,  6,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  6,  8,  6,  8,  8,  8,  8,  6,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  6,  8,  8,  8,  6,  8,  8,  6,  4,  8,  8,  8,  8,  8,  6,  8,  8,  8,  4,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  6,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  2,  8,  6,  8,  8,  8,  6,  8,  8,  6,  6,  8,  8,  6,  8,  8,  8,  8,  8,  8,  8,  8,  6,  8,  8,  8,  8,  8,  8,  8,  8,  8,  4,  8,  8,  8,  6,  8,  8,  6,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  4,  8,  6,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  6,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  6,  6,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  4,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  6,  8,  6,  8,  8,  8,  6,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  8,  6,  8,  8,  8,  8,  8,  6,  8,  8,  8,  6)
,age=c(0,   0,  0,  0,  3,  4,  4,  4,  4,  6,  6,  6,  6,  6,  6,  7,  7,  7,  7,  8,  8,  8,  9,  9,  9,  9,  10, 10, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15, 15, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 30, 30, 30, 31, 31, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 34, 34, 35, 35, 35, 35, 35, 36, 36, 36, 36, 36, 36, 36, 36, 36, 37, 37, 37, 37, 37, 38, 38, 38, 38, 38, 39, 39, 39, 40, 40, 41, 41, 41, 41, 41, 41, 41, 41, 42, 42, 42, 42, 42, 42, 42, 43, 43, 43, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 45, 46, 46, 46, 46, 47, 47, 47, 48, 48, 48, 54, 54, 54, 56, 56, 58, 59, 59, 59, 59, 60, 60, 60, 61, 66, 66, 70, 72))
cutcutcut <- cut2(testdata$age,g=10)
testtable <- table(cutcutcut)

您的问题的答案在于查看您的数据分布:

table(testdata$age)
#  0  3  4  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 
#  4  1  4  6  4  3  4  2  2 16  9  7  5 10  6  7  7 13  4  2  9 
# 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 
# 23 10 18 17  8  5  3  2  8  2  2  5  9  5  5  3  2  8  7  3  6 
# 45 46 47 48 54 56 58 59 60 61 66 70 72 
#  5  4  3  3  3  2  1  4  3  1  2  1  1 
我们发现,一些年龄段在该年龄段有大量个体(例如,有16个个体的年龄为12岁,23个个体的年龄为24岁)。由于切割算法需要将所有年龄完全相同的个体放入同一个桶中,这可能会导致桶中的某些不平衡


由于您的数据中总共有309个观察值,您需要寻找10个桶,因此理想情况下,您希望在9个桶中有31个观察值,在最后一个桶中有30个观察值。现在,最后一个bucket被定义为
[46,72]
,它包含28个元素(太低)。如果将其扩展为
[45,72]
,它将包含33个元素(太多)。由于有5个元素的值为45,因此无法拆分数据以获得最后一个存储桶中的30或31个观测值。

您的问题的答案在于查看数据的分布:

table(testdata$age)
#  0  3  4  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 
#  4  1  4  6  4  3  4  2  2 16  9  7  5 10  6  7  7 13  4  2  9 
# 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 
# 23 10 18 17  8  5  3  2  8  2  2  5  9  5  5  3  2  8  7  3  6 
# 45 46 47 48 54 56 58 59 60 61 66 70 72 
#  5  4  3  3  3  2  1  4  3  1  2  1  1 
我们发现,一些年龄段在该年龄段有大量个体(例如,有16个个体的年龄为12岁,23个个体的年龄为24岁)。由于切割算法需要将所有年龄完全相同的个体放入同一个桶中,这可能会导致桶中的某些不平衡

由于您的数据中总共有309个观察值,您需要寻找10个桶,因此理想情况下,您希望在9个桶中有31个观察值,在最后一个桶中有30个观察值。现在,最后一个bucket被定义为
[46,72]
,它包含28个元素(太低)。如果将其扩展为
[45,72]
,它将包含33个元素(太多)。由于有5个元素的值为45,因此无法拆分数据以在最后一个桶中获得30或31个观测值