R 为数据帧中集合的每次重复在向量之后对数据帧列进行排序_R

R 为数据帧中集合的每次重复在向量之后对数据帧列进行排序

R 为数据帧中集合的每次重复在向量之后对数据帧列进行排序,r,R,我有以下数据框： col1 <- 1:10 col2 <- rep(c("COL","CIP","CHL","GEN","TMP"), 2) col3 <- rep(c("spec1", "spec2"), each = 5) df <- data.frame(col1, col2, col3, stringsAsFactors = F) library(dplyr) order_vector <- c("CHL","GEN","COL","CIP","TMP")

我有以下数据框：

col1 <- 1:10
col2 <- rep(c("COL","CIP","CHL","GEN","TMP"), 2)
col3 <- rep(c("spec1", "spec2"), each = 5)
df <- data.frame(col1, col2, col3, stringsAsFactors = F)

library(dplyr)
order_vector <- c("CHL","GEN","COL","CIP","TMP")

df <- df %>%
  slice(match(order_vector, col2))

col1   col2   col3
3      CHL    spec1
4      GEN    spec1
1      COL    spec1
2      CIP    spec1
5      TMP    spec1

但是，我希望这适用于col3中的所有因子值，最好是使用dplyr。

如果您将

col2

设置为因子，将

顺序向量设置为级别，您可以按其排序
library(dplyr)
df %>% mutate_at("col2",factor,levels=order_vector) %>%
  arrange(col3,col2) %>%
  mutate_at("col2",as.character) # if you want to go back to characters, but maybe you shouldn't

# col1 col2  col3
# 1     3  CHL spec1
# 2     4  GEN spec1
# 3     1  COL spec1
# 4     2  CIP spec1
# 5     5  TMP spec1
# 6     8  CHL spec2
# 7     9  GEN spec2
# 8     6  COL spec2
# 9     7  CIP spec2
# 10   10  TMP spec2

或者更简单，受CPak答案的启发：
df %>% arrange(col3,factor(col2,levels=order_vector))

您还可以使用dplyr
连接保持顺序这一事实：
df %>%
  right_join(data.frame(col2=order_vector)) %>%
  arrange(col3)

#    col1 col2  col3
# 1     3  CHL spec1
# 2     4  GEN spec1
# 3     1  COL spec1
# 4     2  CIP spec1
# 5     5  TMP spec1
# 6     8  CHL spec2
# 7     9  GEN spec2
# 8     6  COL spec2
# 9     7  CIP spec2
# 10   10  TMP spec2

您可以使用forcats:：fct_relevel

df %>% 
   arrange(forcats::fct_relevel(col2, order_vector))

   # col1 col2  col3
# 1     3  CHL spec1
# 2     8  CHL spec2
# 3     4  GEN spec1
# 4     9  GEN spec2
# 5     1  COL spec1
# 6     6  COL spec2
# 7     2  CIP spec1
# 8     7  CIP spec2
# 9     5  TMP spec1
# 10   10  TMP spec2

不将col2
作为一个因素的选项是，在您的match
调用之前添加groupby
语句：
library(dplyr)
col1 <- 1:10
col2 <- rep(c("COL","CIP","CHL","GEN","TMP"), 2)
col3 <- rep(c("spec1", "spec2"), each = 5)
df <- data.frame(col1, col2, col3, stringsAsFactors = F)
order_vector <- c("CHL","GEN","COL","CIP","TMP")
df <- df %>%
  group_by(col3) %>% 
  slice(match(order_vector, col2))
df

库（dplyr）
col1不知何故，group_by（）解决方案在示例数据帧上起作用，但在我的数据帧上不起作用，因为它仍然只保留col3中的第一个值。唯一的区别是col2列中有更多的值，还有一些额外的列，但这不重要吗？谢谢，这很有效。你认为这是一种快速的方法吗？我现在并没有一个很大的数据框架，但以后我可能会有更多的数据来尝试这个框架。对因子进行排序应该很快，转换为因子也应该很快。要优化速度，请从一开始就将所有包含CHL等的表转换为因子，然后只要在需要排序时df%>%arrange（col3，col2）。很可能这两种方式都会很快。
# A tibble: 10 x 3
# Groups:   col3 [2]
    col1 col2  col3 
   <int> <chr> <chr>
 1     3 CHL   spec1
 2     4 GEN   spec1
 3     1 COL   spec1
 4     2 CIP   spec1
 5     5 TMP   spec1
 6     8 CHL   spec2
 7     9 GEN   spec2
 8     6 COL   spec2
 9     7 CIP   spec2
10    10 TMP   spec2