R 对我的数据进行子集设置不会产生预期的结果
我有93个观测数据集。只有两个变量,一个因子(大小,一个数字)和它的响应(百分比,也是一个数字)。因子值的范围为0-2000。我想根据因子值(0-2、2-50和50-2000)将这93个观察值组合成三组,并查看每个组的总组合响应值 以下是我的数据:R 对我的数据进行子集设置不会产生预期的结果,r,R,我有93个观测数据集。只有两个变量,一个因子(大小,一个数字)和它的响应(百分比,也是一个数字)。因子值的范围为0-2000。我想根据因子值(0-2、2-50和50-2000)将这93个观察值组合成三组,并查看每个组的总组合响应值 以下是我的数据: > data2 run size percentage 1 1 0.375 0.010 2 2 0.412 0.020 3 3 0.452 0.032 4 4 0.496
> data2
run size percentage
1 1 0.375 0.010
2 2 0.412 0.020
3 3 0.452 0.032
4 4 0.496 0.043
5 5 0.545 0.053
6 6 0.598 0.060
7 7 0.656 0.066
8 8 0.721 0.070
9 9 0.791 0.071
10 10 0.868 0.072
11 11 0.953 0.070
12 12 1.047 0.069
13 13 1.149 0.067
14 14 1.261 0.065
15 15 1.385 0.065
16 16 1.520 0.066
17 17 1.668 0.068
18 18 1.832 0.072
19 19 2.011 0.077
20 20 2.207 0.083
21 21 2.423 0.090
22 22 2.660 0.097
23 23 2.920 0.10
24 24 3.205 0.11
25 25 3.519 0.12
26 26 3.863 0.13
27 27 4.240 0.13
28 28 4.655 0.14
29 29 5.110 0.14
30 30 5.610 0.14
31 31 6.158 0.14
32 32 6.760 0.14
33 33 7.421 0.15
34 34 8.147 0.15
35 35 8.943 0.15
36 36 9.817 0.16
37 37 10.78 0.18
38 38 11.83 0.19
39 39 12.99 0.21
40 40 14.26 0.23
41 41 15.65 0.24
42 42 17.18 0.25
43 43 18.86 0.27
44 44 20.70 0.28
45 45 22.73 0.30
46 46 24.95 0.30
47 47 27.39 0.29
48 48 30.07 0.27
49 49 33.01 0.23
50 50 36.24 0.21
51 51 39.78 0.20
52 52 43.67 0.21
53 53 47.94 0.22
54 54 52.62 0.19
55 55 57.77 0.13
56 56 63.41 0.070
57 57 69.61 0.055
58 58 76.42 0.087
59 59 83.89 0.14
60 60 92.09 0.17
61 61 101.1 0.17
62 62 111.0 0.18
63 63 121.8 0.27
64 64 133.7 0.43
65 65 146.8 0.64
66 66 161.2 0.88
67 67 176.9 1.16
68 68 194.2 1.51
69 69 213.2 1.94
70 70 234.1 2.47
71 71 256.9 3.16
72 72 282.1 4.03
73 73 309.6 5.02
74 74 339.9 6.05
75 75 373.1 6.96
76 76 409.6 7.63
77 77 449.7 8.01
78 78 493.6 8.08
79 79 541.9 7.82
80 80 594.9 7.13
81 81 653.0 6.01
82 82 716.8 4.81
83 83 786.9 3.57
84 84 863.9 2.09
85 85 948.3 1.01
86 86 1041 0.55
87 87 1143 0.22
88 88 1255 0.018
89 89 1377 0
90 90 1512 0
91 91 1660 0
92 92 1822 0
93 2000
这里是dput的输出
dput(data2)
structure(list(run = c("1", "2", "3", "4", "5", "6", "7", "8",
"9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19",
"20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30",
"31", "32", "33", "34", "35", "36", "37", "38", "39", "40", "41",
"42", "43", "44", "45", "46", "47", "48", "49", "50", "51", "52",
"53", "54", "55", "56", "57", "58", "59", "60", "61", "62", "63",
"64", "65", "66", "67", "68", "69", "70", "71", "72", "73", "74",
"75", "76", "77", "78", "79", "80", "81", "82", "83", "84", "85",
"86", "87", "88", "89", "90", "91", "92", ""), size = c("0.375",
"0.412", "0.452", "0.496", "0.545", "0.598", "0.656", "0.721",
"0.791", "0.868", "0.953", "1.047", "1.149", "1.261", "1.385",
"1.520", "1.668", "1.832", "2.011", "2.207", "2.423", "2.660",
"2.920", "3.205", "3.519", "3.863", "4.240", "4.655", "5.110",
"5.610", "6.158", "6.760", "7.421", "8.147", "8.943", "9.817",
"10.78", "11.83", "12.99", "14.26", "15.65", "17.18", "18.86",
"20.70", "22.73", "24.95", "27.39", "30.07", "33.01", "36.24",
"39.78", "43.67", "47.94", "52.62", "57.77", "63.41", "69.61",
"76.42", "83.89", "92.09", "101.1", "111.0", "121.8", "133.7",
"146.8", "161.2", "176.9", "194.2", "213.2", "234.1", "256.9",
"282.1", "309.6", "339.9", "373.1", "409.6", "449.7", "493.6",
"541.9", "594.9", "653.0", "716.8", "786.9", "863.9", "948.3",
"1041", "1143", "1255", "1377", "1512", "1660", "1822", "2000"
), percentage = c("0.013", "0.023", "0.034", "0.049", "0.061",
"0.072", "0.083", "0.093", "0.10", "0.11", "0.12", "0.12", "0.13",
"0.14", "0.14", "0.15", "0.15", "0.16", "0.17", "0.17", "0.18",
"0.19", "0.20", "0.21", "0.22", "0.24", "0.25", "0.26", "0.28",
"0.30", "0.31", "0.33", "0.35", "0.37", "0.39", "0.42", "0.45",
"0.47", "0.50", "0.53", "0.56", "0.58", "0.59", "0.59", "0.58",
"0.55", "0.52", "0.49", "0.46", "0.45", "0.45", "0.45", "0.44",
"0.42", "0.38", "0.35", "0.32", "0.31", "0.33", "0.36", "0.42",
"0.49", "0.59", "0.74", "0.94", "1.19", "1.49", "1.82", "2.18",
"2.55", "2.94", "3.34", "3.78", "4.25", "4.73", "5.20", "5.60",
"5.87", "5.93", "5.77", "5.37", "4.77", "4.03", "3.21", "2.36",
"1.55", "0.81", "0.30", "0.056", "0.0044", "0", "0", "")), class = "data.frame", row.names = c(NA,
-93L))
我尝试了以下代码,我认为这些代码应该会给出我想要的结果:
clay <- data2 %>% filter(size <= 2)
silt <- data2 %>% filter(size > 2 & size <= 50)
sand <- data2 %>% filter(size > 50 & size <= 2000)
sum(as.numeric(clay$percentage), na.rm=TRUE)
[1] 8.637
sum(as.numeric(silt$percentage), na.rm=TRUE)
[1] 57.217
sum(as.numeric(sand$percentage), na.rm=TRUE)
[1] 0
clay%filter(大小2&size 50&size问题在于列的类型。可以使用type自动更改列的类型。从base R
或type\u convert
fromreadr
library(dplyr)
library(readr)
data2 <- data2 %>%
type_convert
根据?比较
如果两个参数是不同类型的原子向量,则一个强制为另一个的类型,优先顺序(递减)为字符、复数、数字、整数、逻辑和原始
更改类型后,运行OP的代码
sum(clay$percentage, na.rm=TRUE)
#[1] 1.748
sum(silt$percentage, na.rm=TRUE)
#[1] 13.5
sum(sand$percentage, na.rm=TRUE)
#[1] 84.7504
首先,您可以尝试将因子变量转换为数字:
data2$size您是否检查了列的类
。它是因子
?在这种情况下,您需要as.numeric(as.character
I getsum(as.numeric(sand$percentage),na.rm=TRUE)#[1]92.66#
和sum(as.numeric(淤泥$percentage),na.rm=TRUE)[1]6.327
和sum(as.numeric(as.numeric$percentage),na.rm=TRUE)[1]1.039
@joran刚刚为您添加了该输出。我不太熟悉提供这些数据的程序,但我相信它们是NAs,这就是为什么我使用了na.rm=TRUE
@akrun。您是否做了任何不同的事情来获得该输出?这是它们应该是什么样子,但我在运行此代码时没有得到这些输出。还有,当我同学们,他们是characters@Trev.我把这些列读作数值
。如果你有因子
列,请通过转换为数值(作为.character
而不是作为.numeric
直接显示在它上面感谢您的响应!当我执行您共享的代码时,我的大小因子变成了一个NAs字符串。我在这里做错了什么吗?这里工作正常。但是,您应该在这里查找,以找到有关此过程的更多信息。最好是处理numeric或integer变量本身。就我所知,因子变量最适合分类变量。谢谢!我能够用它来解决我的问题。
sum(clay$percentage, na.rm=TRUE)
#[1] 1.748
sum(silt$percentage, na.rm=TRUE)
#[1] 13.5
sum(sand$percentage, na.rm=TRUE)
#[1] 84.7504
sum(data2$percentage[data2$group=='Group 1'])
#[1] 1.75
sum(data2$percentage[data2$group=='Group 2'])
#[1] 13.5
sum(data2$percentage[data2$group=='Group 3'])
#[1] 84.8