R 有没有办法更有效地编码这个范围?
嗨,我有一个文件被导入了r, 我想对其中一列重新编码R 有没有办法更有效地编码这个范围?,r,R,嗨,我有一个文件被导入了r, 我想对其中一列重新编码 Number of People 1 to 3 4 to 6 7 to 10 . . . . “人数”一栏总共有30多个级别。 我想做的是将它们转换成数值(即“1到3”变成“2”,“4到6”变成“5”) 由于我有大量的数据要处理,是否有更有效的方法对此进行重新编码,还是只有使用recode()才能进行重新编码 谢谢 样本数据: df <- data.frame( Number_of_ppl = c("1 to 3&qu
Number of People
1 to 3
4 to 6
7 to 10
.
.
.
.
“人数”一栏总共有30多个级别。
我想做的是将它们转换成数值(即“1到3”变成“2”,“4到6”变成“5”)
由于我有大量的数据要处理,是否有更有效的方法对此进行重新编码,还是只有使用recode()才能进行重新编码
谢谢 样本数据:
df <- data.frame(
Number_of_ppl = c("1 to 3", "40 to 45")
)
如果要将平均值作为数据帧中的新列,请将结果存储为新变量:
df$Number_of_ppl_mean <- sapply(lapply(str_extract_all(df$Number_of_ppl, "\\d+"), as.numeric), mean)
这是一个基于
dplyr
的解决方案,其基本结构与Chris Ruehlemann的答案相同
library(dplyr)
library(stringr)
df <- data.frame(Number_of_People = c("1 to 3",
"4 to 6",
"7 to 10"))
df %>%
mutate(first_numb = as.numeric(str_extract(Number_of_People, "^\\d{1,}")),
second_numb = as.numeric(str_extract(Number_of_People, "\\d{1,}$"))) %>%
rowwise() %>%
mutate(avg = mean(c(first_numb, second_numb)))
# A tibble: 3 x 4
Number_of_People first_numb second_numb avg
<fct> <dbl> <dbl> <dbl>
1 1 to 3 1 3 2
2 4 to 6 4 6 5
3 7 to 10 7 10 8.5
库(dplyr)
图书馆(stringr)
df%
mutate(first_numb=as.numeric(str_extract(人数,“^\\d{1,}”),
second_numb=as.numeric(str_extract(人数,“\\d{1,}$”))%>%
行()
变异(平均值=平均值(c(第一次麻木,第二次麻木)))
#一个tibble:3x4
人数第一位第二位平均人数
1至3 1 3 2
2 4至6 4 6 5
3 7至10 7 10 8.5
我们也可以使用separate
将列一分为二,然后得到列的平均值
library(dplyr)
library(tidyr)
df %>%
separate(Number_of_People, into = c("first", "second"), sep="\\s*to\\s*",
convert = TRUE, remove = FALSE) %>%
mutate(avg = (first + second)/2)
# Number_of_People first second avg
#1 1 to 3 1 3 2.0
#2 4 to 6 4 6 5.0
#3 7 to 10 7 10 8.5
数据
df7到10的平均值(8.5)df是多少?你在帖子中没有提到你想要两个数字的平均值。
library(dplyr)
library(stringr)
df <- data.frame(Number_of_People = c("1 to 3",
"4 to 6",
"7 to 10"))
df %>%
mutate(first_numb = as.numeric(str_extract(Number_of_People, "^\\d{1,}")),
second_numb = as.numeric(str_extract(Number_of_People, "\\d{1,}$"))) %>%
rowwise() %>%
mutate(avg = mean(c(first_numb, second_numb)))
# A tibble: 3 x 4
Number_of_People first_numb second_numb avg
<fct> <dbl> <dbl> <dbl>
1 1 to 3 1 3 2
2 4 to 6 4 6 5
3 7 to 10 7 10 8.5
library(dplyr)
library(tidyr)
df %>%
separate(Number_of_People, into = c("first", "second"), sep="\\s*to\\s*",
convert = TRUE, remove = FALSE) %>%
mutate(avg = (first + second)/2)
# Number_of_People first second avg
#1 1 to 3 1 3 2.0
#2 4 to 6 4 6 5.0
#3 7 to 10 7 10 8.5
df <- data.frame(Number_of_People = c("1 to 3",
"4 to 6",
"7 to 10"))