R 基于其他两个向量对列重新编码
这是我的数据集:R 基于其他两个向量对列重新编码,r,dplyr,rank,R,Dplyr,Rank,这是我的数据集: df = structure(list(from = c(0, 0, 0, 0, 38, 43, 49, 54), to = c(43, 54, 56, 62, 62, 62, 62, 62), count = c(342, 181, 194, 386, 200, 480, 214, 176), group = c("keiner", "keiner", "keiner", "keiner", "paid", "paid", "owned", "earned")), cl
df = structure(list(from = c(0, 0, 0, 0, 38, 43, 49, 54), to = c(43,
54, 56, 62, 62, 62, 62, 62), count = c(342, 181, 194, 386, 200,
480, 214, 176), group = c("keiner", "keiner", "keiner", "keiner",
"paid", "paid", "owned", "earned")), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -8L))
我的问题是需要对from
和to
列进行排序(必须对from
和to
两列进行排序),因为可视化库需要这样做,而且还需要从索引0开始。
这就是我构建两个向量的原因,一个(ranking
)对两列的每个唯一值进行排序,另一个(uniquevalues
)对数据集的原始唯一值进行排序
ranking <- dplyr::dense_rank(unique(c(df$from, df$to))) - 1 ### Start Index at 0, "recode" variables
uniquevalues <- unique(c(df$from, df$to))
应该是这样的:
from to count group
<dbl> <dbl> <dbl> <chr>
1 0 2 342 keiner
2 0 4 181 keiner
3 0 5 194 keiner
4 0 6 386 keiner
5 1 6 200 paid
6 2 6 480 paid
7 3 6 214 owned
8 4 6 176 earned
从到计数组
102342凯纳
2 0 4 181凯纳
3 0 5 194凯纳
406386凯纳
516200已缴付
626480已缴付
7 3 6 214拥有
846176
我们可以取消列出
值,并将它们与唯一值匹配
df[1:2] <- match(unlist(df[1:2]), uniquevalues) - 1
df
# from to count group
# <dbl> <dbl> <dbl> <chr>
#1 0 2 342 keiner
#2 0 4 181 keiner
#3 0 5 194 keiner
#4 0 6 386 keiner
#5 1 6 200 paid
#6 2 6 480 paid
#7 3 6 214 owned
#8 4 6 176 earned
df[1:2]我们可以取消列出
值,并将其与唯一值匹配
df[1:2] <- match(unlist(df[1:2]), uniquevalues) - 1
df
# from to count group
# <dbl> <dbl> <dbl> <chr>
#1 0 2 342 keiner
#2 0 4 181 keiner
#3 0 5 194 keiner
#4 0 6 386 keiner
#5 1 6 200 paid
#6 2 6 480 paid
#7 3 6 214 owned
#8 4 6 176 earned
df[1:2]我会使用mapvalues
函数。像这样
library(plyr)
df[ , 1:2] <- mapvalues(unlist(df[ , 1:2]),
from= uniquevalues,
to= ranking)
df
# from to count group
# <dbl> <dbl> <dbl> <chr>
#1 0 2 342 keiner
#2 0 4 181 keiner
#3 0 5 194 keiner
#4 0 6 386 keiner
#5 1 6 200 paid
#6 2 6 480 paid
#7 3 6 214 owned
#8 4 6 176 earned
库(plyr)
df[,1:2]我会使用mapvalues
函数。像这样
library(plyr)
df[ , 1:2] <- mapvalues(unlist(df[ , 1:2]),
from= uniquevalues,
to= ranking)
df
# from to count group
# <dbl> <dbl> <dbl> <chr>
#1 0 2 342 keiner
#2 0 4 181 keiner
#3 0 5 194 keiner
#4 0 6 386 keiner
#5 1 6 200 paid
#6 2 6 480 paid
#7 3 6 214 owned
#8 4 6 176 earned
库(plyr)
df[,1:2]另一种解决方案转换为因子并返回
f <- unique(unlist(df1[1:2]))
df[1:2] <- lapply(df[1:2], function(x) {
as.integer(as.character(factor(x, levels=f, labels=1:length(f) - 1)))
})
df
# # A tibble: 8 x 4
# from to count group
# <fct> <fct> <dbl> <chr>
# 1 0 2 342 keiner
# 2 0 4 181 keiner
# 3 0 5 194 keiner
# 4 0 6 386 keiner
# 5 1 6 200 paid
# 6 2 6 480 paid
# 7 3 6 214 owned
# 8 4 6 176 earned
f另一个解决方案转换为因子并返回
f <- unique(unlist(df1[1:2]))
df[1:2] <- lapply(df[1:2], function(x) {
as.integer(as.character(factor(x, levels=f, labels=1:length(f) - 1)))
})
df
# # A tibble: 8 x 4
# from to count group
# <fct> <fct> <dbl> <chr>
# 1 0 2 342 keiner
# 2 0 4 181 keiner
# 3 0 5 194 keiner
# 4 0 6 386 keiner
# 5 1 6 200 paid
# 6 2 6 480 paid
# 7 3 6 214 owned
# 8 4 6 176 earned
f抢了我的帖子。这些帖子并没有真正帮助我,已经在寻找答案了。我的帖子上了。这些帖子并没有真正帮助我,我已经在寻找答案。我认为这里的关键观点与匹配无关,而是与R将重塑一个原子向量以适应多个数据帧列的事实有关,只要列长度之和等于向量长度。结果是,对OP的原始代码进行一些细微的更改将产生相同的结果:df[1:2]我认为这里的关键观点与匹配关系不大,事实上,只要列长度之和等于向量长度,R就会改变一个原子向量的形状以适应多个数据帧列。结果是,对OP的原始代码进行一些细微的更改将产生相同的结果:df[1:2]