基于R中另一列的类别创建一列的类别
所以我有一个数据框,看起来像这样:基于R中另一列的类别创建一列的类别,r,grouping,categories,R,Grouping,Categories,所以我有一个数据框,看起来像这样: x y 1 (0,4] 1 2 (0,4] 2 3 (0,4] 3 4 (0,4] 4 5 (4,5] 5 6 (5,10] 6 7 (5,10] 7 8 (5,10] 8 9 (5,10] 9 10 (5,10] 10 11 (10,20] 11 12 (10,20] 12 13 (10,20
x y
1 (0,4] 1
2 (0,4] 2
3 (0,4] 3
4 (0,4] 4
5 (4,5] 5
6 (5,10] 6
7 (5,10] 7
8 (5,10] 8
9 (5,10] 9
10 (5,10] 10
11 (10,20] 11
12 (10,20] 12
13 (10,20] 13
14 (10,20] 14
15 (10,20] 15
16 (10,20] 16
17 (10,20] 17
18 (10,20] 18
19 (10,20] 19
20 (10,20] 20
21 (20,40] 21
22 (20,40] 22
23 (20,40] 23
24 (20,40] 24
25 (20,40] 25
26 (20,40] 26
27 (20,40] 27
28 (20,40] 28
29 (20,40] 29
30 (20,40] 30
我想用不规则的分区来划分Y列,这些分区对x列进行了分类,而不需要遍历和硬编码每个特定的分界点。有没有办法做到这一点
提前谢谢
编辑:希望输出
x y
1 (0,4] (0,4]
2 (0,4] (0,4]
3 (0,4] (0,4]
4 (0,4] (0,4]
5 (4,5] (4,5]
6 (5,10] (5,10]
7 (5,10] (5,10]
8 (5,10] (5,10]
9 (5,10] (5,10]
10 (5,10] (5,10]
11 (10,20] (10,20]
12 (10,20] (10,20]
13 (10,20] (10,20]
14 (10,20] (10,20]
15 (10,20] (10,20]
16 (10,20] (10,20]
17 (10,20] (10,20]
18 (10,20] (10,20]
19 (10,20] (10,20]
20 (10,20] (10,20]
21 (20,40] (20,40]
22 (20,40] (20,40]
23 (20,40] (20,40]
24 (20,40] (20,40]
25 (20,40] (20,40]
26 (20,40] (20,40]
27 (20,40] (20,40]
28 (20,40] (20,40]
29 (20,40] (20,40]
30 (20,40] (20,40]
从现有切割点提取数字:
library(stringr)
cutpoints = sort(as.numeric(unique(unlist(str_extract_all(df$x, pattern = "\\d+")))))
使用这些切点进行切割
df$y = cut(df$y, breaks = cutpoints)
使用此可再现数据:
df = structure(list(x = structure(c(1L, 1L, 1L, 1L, 4L, 5L, 5L, 5L,
5L, 5L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L), .Label = c("(0,4]", "(10,20]", "(20,40]",
"(4,5]", "(5,10]"), class = "factor"), y = 1:30), .Names = c("x",
"y"), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15",
"16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26",
"27", "28", "29", "30"))
我们可以从“x”中提取最后一个数字子字符串,将其转换为
数字
,获取唯一的
元素,并将其用作剪切中的中断
cut(df1$y, breaks= c(0,sort(unique(as.numeric(sub(".*,(\\d+)\\D+$", "\\1", df1$x))))))
#[1] (0,4] (0,4] (0,4] (0,4] (4,5] (5,10] (5,10] (5,10] (5,10]
#[10] (5,10] (10,20] (10,20] (10,20] (10,20] (10,20] (10,20] (10,20] (10,20]
#[19] (10,20] (10,20] (20,40] (20,40] (20,40] (20,40] (20,40] (20,40] (20,40]
#[28] (20,40] (20,40] (20,40]
#Levels: (0,4] (4,5] (5,10] (10,20] (20,40]
@akrun我觉得你以前就因为这件事责备过我。为忘记再次输入输出而道歉。我现在编辑。您的输出似乎显示了两个相同的列data.frame(x=df$x,y=df$x)
您可以尝试df1$y@Frank,因为我处理的两列在分类时看起来是相同的。使用划分x
的相同代码来划分y
。