Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/78.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 有没有办法更有效地编码这个范围?_R - Fatal编程技术网

R 有没有办法更有效地编码这个范围?

R 有没有办法更有效地编码这个范围?,r,R,嗨,我有一个文件被导入了r, 我想对其中一列重新编码 Number of People 1 to 3 4 to 6 7 to 10 . . . . “人数”一栏总共有30多个级别。 我想做的是将它们转换成数值(即“1到3”变成“2”,“4到6”变成“5”) 由于我有大量的数据要处理,是否有更有效的方法对此进行重新编码,还是只有使用recode()才能进行重新编码 谢谢 样本数据: df <- data.frame( Number_of_ppl = c("1 to 3&qu

嗨,我有一个文件被导入了r, 我想对其中一列重新编码

Number of People
1 to 3
4 to 6 
7 to 10
.
.
.
.
“人数”一栏总共有30多个级别。 我想做的是将它们转换成数值(即“1到3”变成“2”,“4到6”变成“5”)

由于我有大量的数据要处理,是否有更有效的方法对此进行重新编码,还是只有使用recode()才能进行重新编码

谢谢

样本数据:

df <- data.frame(
  Number_of_ppl = c("1 to 3", "40 to 45")
)
如果要将平均值作为数据帧中的新列,请将结果存储为新变量:

df$Number_of_ppl_mean <- sapply(lapply(str_extract_all(df$Number_of_ppl, "\\d+"), as.numeric), mean)

这是一个基于
dplyr
的解决方案,其基本结构与Chris Ruehlemann的答案相同

library(dplyr)
library(stringr)

df <- data.frame(Number_of_People = c("1 to 3",
                                       "4 to 6",
                                       "7 to 10"))

df %>%
  mutate(first_numb = as.numeric(str_extract(Number_of_People, "^\\d{1,}")),
         second_numb = as.numeric(str_extract(Number_of_People, "\\d{1,}$"))) %>%
  rowwise() %>%
  mutate(avg = mean(c(first_numb, second_numb)))
# A tibble: 3 x 4
  Number_of_People first_numb second_numb   avg
  <fct>                 <dbl>       <dbl> <dbl>
1 1 to 3                    1           3   2  
2 4 to 6                    4           6   5  
3 7 to 10                   7          10   8.5
库(dplyr)
图书馆(stringr)
df%
mutate(first_numb=as.numeric(str_extract(人数,“^\\d{1,}”),
second_numb=as.numeric(str_extract(人数,“\\d{1,}$”))%>%
行()
变异(平均值=平均值(c(第一次麻木,第二次麻木)))
#一个tibble:3x4
人数第一位第二位平均人数
1至3 1 3 2
2 4至6 4 6 5
3 7至10 7 10 8.5

我们也可以使用
separate
将列一分为二,然后得到列的
平均值

library(dplyr)
library(tidyr)
df %>% 
     separate(Number_of_People, into = c("first", "second"), sep="\\s*to\\s*",
           convert = TRUE, remove = FALSE) %>% 
     mutate(avg =  (first + second)/2)
#  Number_of_People first second avg
#1           1 to 3     1      3 2.0
#2           4 to 6     4      6 5.0
#3          7 to 10     7     10 8.5
数据
df
7到10的平均值(8.5)
df是多少?你在帖子中没有提到你想要两个数字的平均值。
library(dplyr)
library(stringr)

df <- data.frame(Number_of_People = c("1 to 3",
                                       "4 to 6",
                                       "7 to 10"))

df %>%
  mutate(first_numb = as.numeric(str_extract(Number_of_People, "^\\d{1,}")),
         second_numb = as.numeric(str_extract(Number_of_People, "\\d{1,}$"))) %>%
  rowwise() %>%
  mutate(avg = mean(c(first_numb, second_numb)))
# A tibble: 3 x 4
  Number_of_People first_numb second_numb   avg
  <fct>                 <dbl>       <dbl> <dbl>
1 1 to 3                    1           3   2  
2 4 to 6                    4           6   5  
3 7 to 10                   7          10   8.5
library(dplyr)
library(tidyr)
df %>% 
     separate(Number_of_People, into = c("first", "second"), sep="\\s*to\\s*",
           convert = TRUE, remove = FALSE) %>% 
     mutate(avg =  (first + second)/2)
#  Number_of_People first second avg
#1           1 to 3     1      3 2.0
#2           4 to 6     4      6 5.0
#3          7 to 10     7     10 8.5
df <- data.frame(Number_of_People = c("1 to 3",
                                       "4 to 6",
                                       "7 to 10"))