R 聚合子集返回此错误:强制引入NAs
我很难找到数据子集的平均值。以下是我希望回答的两个问题。第一个似乎工作正常,但第二个返回与第一个相同的答案,但小数点右侧没有数字。发生什么事了 还会出现一个错误: 强制引入的NAs强制引入的NAs强制引入的NAs强制引入的NAs强制引入的NAsR 聚合子集返回此错误:强制引入NAs,r,aggregate-functions,R,Aggregate Functions,我很难找到数据子集的平均值。以下是我希望回答的两个问题。第一个似乎工作正常,但第二个返回与第一个相同的答案,但小数点右侧没有数字。发生什么事了 还会出现一个错误: 强制引入的NAs强制引入的NAs强制引入的NAs强制引入的NAs强制引入的NAs # What is the mean suspension rate for schools by farms overall? aggregate(suspension_rate_total ~ farms, merged_data, FUN = f
# What is the mean suspension rate for schools by farms overall?
aggregate(suspension_rate_total ~ farms, merged_data, FUN = function(suspension_rate_total)
mean(as.numeric(as.character(suspension_rate_total))))
# What is the mean suspension rate for schools with farms > 100?
aggregate(suspension_rate_total ~ farms, merged_data, FUN = function(suspension_rate_total)
mean(as.numeric(as.character(suspension_rate_total))), subset = farms< 100)
#按农场划分的学校平均停课率是多少?
聚合(暂停费率总计~农场,合并数据,FUN=函数(暂停费率总计)
平均值(以数字形式(以字符形式(暂停率总和)))
#农场>100的学校的平均停课率是多少?
聚合(暂停费率总计~农场,合并数据,FUN=函数(暂停费率总计)
平均值(作为数字(作为字符(暂停率总)),子集=100)
数据
merged_data <- structure(list(schid = c("1030642", "1030766", "1030774", "1030840",
"1130103", "1230150", "1530435", "1530492", "1530500", "1931047",
"1931708", "1931864", "1932623", "1933746", "1937226", "1938554",
"1938612", "1938885", "1995836", "1996016"), farms = c("132",
"116", "348", "406", "68", "130", "370", "204", "225", "2,616",
"1,106", "1,918", "1,148", "2,445", "1,123", "1,245", "1,369",
"1,073", "932", "178"), foster = c("2", "0", "1", "8", "1", "4",
"4", "0", "0", "22", "11", "12", "2", "8", "13", "13", "4", "3",
"2", "3"), homeless = c("14", "0", "8", "4", "1", "4", "5", "0",
"14", "35", "42", "116", "9", "8", "34", "54", "26", "31", "5",
"11"), migrant = c("0", "0", "0", "0", "0", "0", "18", "0", "0",
"0", "0", "0", "0", "0", "0", "1", "0", "0", "0", "0"), ell = c("18",
"12", "114", "45", "7", "4", "50", "28", "26", "274", "212",
"325", "95", "112", "232", "185", "121", "84", "24", "35"), suspension_rate_total = c("*",
"20", "0", "0", "95", "5", "*", "256", "78", "33", "20", "1",
"218", "120", "0", "0", "*", "*", "*", "0"), suspension_violent = c("*",
"9", "0", "0", "20", "2", "*", "38", "0", "6", "3", "0", "53",
"35", "0", "0", "*", "*", "*", "0"), suspension_violent_no_injury = c("*",
"6", "0", "0", "47", "1", "*", "121", "52", "7", "13", "1", "77",
"44", "0", "0", "*", "*", "*", "0"), suspension_weapon = c("*",
"0", "0", "0", "8", "0", "*", "1", "0", "1", "1", "0", "4", "3",
"0", "0", "*", "*", "*", "0"), suspension_drug = c("*", "0",
"0", "0", "9", "1", "*", "59", "12", "16", "0", "0", "6", "5",
"0", "0", "*", "*", "*", "0"), suspension_defiance = c("*", "1",
"0", "0", "9", "1", "*", "16", "12", "0", "3", "0", "69", "30",
"0", "0", "*", "*", "*", "0"), suspension_other = c("*", "4",
"0", "0", "2", "0", "*", "21", "2", "3", "0", "0", "9", "3",
"0", "0", "*", "*", "*", "0")), row.names = c(NA, 20L), class = "data.frame")
merged_data您确定“强制引入的NA”是错误而不是警告吗。
将字符列转换为数字时:
as.numeric(as.character(suspension\u rate\u total))
将空格强制转换为NA,并通过警告提示
另外,我对这两个代码块得到了不同的答案
> aggregate(suspension_rate_total ~ farms, merged_data, FUN = function(suspension_rate_total)
+ mean(as.numeric(as.character(suspension_rate_total))))
farms suspension_rate_total
1 68 95
2 116 20
3 130 5
4 132 NA
5 178 0
6 204 256
7 225 78
8 348 0
9 370 NA
10 406 0
11 932 NA
>聚合(暂停费率总计~农场,合并数据,FUN=函数(暂停费率总计)
+平均值(作为数字(作为字符(暂停率总)),子集=100)
农场停业率总计
1 68 95
>
>
此外,关于您的第二段代码的注释提到了farms>100?
,但在您的代码中,您使用了subset=farms<100
是否确定“强制引入的NA”是错误而不是警告。
将字符列转换为数字时:
as.numeric(as.character(suspension\u rate\u total))
将空格强制转换为NA,并通过警告提示
另外,我对这两个代码块得到了不同的答案
> aggregate(suspension_rate_total ~ farms, merged_data, FUN = function(suspension_rate_total)
+ mean(as.numeric(as.character(suspension_rate_total))))
farms suspension_rate_total
1 68 95
2 116 20
3 130 5
4 132 NA
5 178 0
6 204 256
7 225 78
8 348 0
9 370 NA
10 406 0
11 932 NA
>聚合(暂停费率总计~农场,合并数据,FUN=函数(暂停费率总计)
+平均值(作为数字(作为字符(暂停率总)),子集=100)
农场停业率总计
1 68 95
>
>
此外,关于您的第二段代码的注释提到了farms>100?
,但在您的代码中,您使用了subset=farms<100
整理您的数据:
# replace * with NA
merged_data$suspension_rate_total[merged_data$suspension_rate_total == '*'] <- NA
# convert character to numeric format
merged_data$suspension_rate_total <- as.numeric(merged_data$suspension_rate_total)
# remove comma in strings and convert character to numeric format
merged_data$farms <- as.numeric(gsub(",", "", merged_data$farms))
整理数据:
# replace * with NA
merged_data$suspension_rate_total[merged_data$suspension_rate_total == '*'] <- NA
# convert character to numeric format
merged_data$suspension_rate_total <- as.numeric(merged_data$suspension_rate_total)
# remove comma in strings and convert character to numeric format
merged_data$farms <- as.numeric(gsub(",", "", merged_data$farms))
谢谢Shubham!当我输入两个代码字符串时,我在完整的数据集中添加了两个输出图像。它看起来像是创建相同的输出,但是否使用.0000。谢谢Shubham!当我输入两个代码字符串时,我在完整的数据集中添加了两个输出图像。它看起来像是创建相同的输出,但是否为.0000。注释不用于扩展讨论;此对话已结束。评论不用于扩展讨论;这段对话已经结束。