R 如何基于两列进行排序,但将特定字符串保持在一起

R 如何基于两列进行排序,但将特定字符串保持在一起,r,sorting,R,Sorting,我有一个数据如下。 我正试着把它们分类 df<-structure(list(string = structure(c(4L, 4L, 4L, 9L, 9L, 6L, 6L, 5L, 2L, 1L, 7L, 7L, 7L, 8L, 8L, 3L, 3L), .Label = c("CGSKDNIKHVPGGGSVQIVYKPVDLSK", "ESPLQTPTEDGSEEPGSETSDAK", "KDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPGSETSD

我有一个数据如下。 我正试着把它们分类

    df<-structure(list(string = structure(c(4L, 4L, 4L, 9L, 9L, 6L, 6L, 
5L, 2L, 1L, 7L, 7L, 7L, 8L, 8L, 3L, 3L), .Label = c("CGSKDNIKHVPGGGSVQIVYKPVDLSK", 
"ESPLQTPTEDGSEEPGSETSDAK", "KDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPGSETSDAK", 
"SKDGTGSDDKK", "SPSSAKSRLQTAPVPMPDLKNVK", "SRLQTAPVPMPDLK", "SRLQTAPVPMPDLKNVKSK", 
"SRLQTAPVPMPDLKNVKSKIGSTENLK", "VQIINKKLDLSNVQSK"), class = "factor"), 
    key = structure(c(1L, 2L, 3L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 
    2L, 3L, 1L, 3L, 2L, 3L, 3L), .Label = c("Mys: G52: ru1", 
    "Mys: G52: ru2", "Mys: G52: ru3"), class = "factor"), val = structure(c(3L, 
    13L, 16L, 15L, 6L, 2L, 2L, 11L, 9L, 5L, 1L, 7L, 8L, 12L, 
    4L, 10L, 14L), .Label = c("1442983324", "1451319531", "1512864.443", 
    "1612410048", "16349475.63", "1784901841", "30553282.01", 
    "317403612.9", "3612004.547", "3686081.063", "39135868.44", 
    "43701608", "64223793.8", "64959501.42", "775987137.8", "9767666215"
    ), class = "factor")), .Names = c("string", "key", "val"), class = "data.frame", row.names = c(NA, 
-17L))

半柱裂秒柱

listdf<-strsplit(as.character(df[,2]),split=":")

半柱裂秒柱

listdf<-strsplit(as.character(df[,2]),split=":")

您需要首先计算字符串的长度,然后根据该列进行排序。为此,我首先创建了一个新的数据帧df_tmp,然后将其合并到df2中

代码


您需要首先计算字符串的长度,然后根据该列进行排序。为此,我首先创建了一个新的数据帧df_tmp,然后将其合并到df2中

代码


尝试Hadley的tidyverse函数:

library(tidyverse)

df_sorted <- df %>% 
    # get length of string
    mutate(length_string = map_dbl(as.character(string), nchar)) %>%
    # arrange first by number of characters, then string, then key
    arrange(length_string, string, key) %>%
    # remove length column
    select(-length_string)

尝试Hadley的tidyverse函数:

library(tidyverse)

df_sorted <- df %>% 
    # get length of string
    mutate(length_string = map_dbl(as.character(string), nchar)) %>%
    # arrange first by number of characters, then string, then key
    arrange(length_string, string, key) %>%
    # remove length column
    select(-length_string)
您需要使用nchar函数,但首先必须将df$string从factor转换为字符类型

以下是使用tidyverse工具的解决方案:

图书馆“tidyverse” df% 排列字符串、键 df2 >字符串键val >1 SKDGTGSDDK Mys:G52:ru1 1512864.443 >2 SKDGTGSDDKK Mys:G52:ru2 64223793.8 >3 SKDGTGSDDKK Mys:G52:ru3 9767666215 >4 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531 >5 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531 >6 VQIINKKLDLSNVQSK Mys:G52:ru1 775987137.8 >7 VQIINKKLDLSNVQSK Mys:G52:ru2 1784901841 >8 SRLQTAPVPMPDLKNVKSK Mys:G52:ru1 317403612.9 >9 SRLQTAPVPMPDLKNVKSK Mys:G52:ru2 1442983324 >10 SRLQTAPVPMPDLKNVKSK Mys:G52:ru3 30553282.01 >11 SPSSAKSRLQTAPVPMPDLKNVK Mys:G52:ru1 39135868.44 >12 ESPLQTPTEDGSETSDAK Mys:G52:ru1 3612004.547 >13 CGSKDNIKHVPGGGSVQIVYKPVDLSK Mys:G52:ru1 16349475.63 >14 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru2 1612410048 >15 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru3 43701608 >16 kdqggytmhqdqegdtdalgesplqtptedgeseepgsetsdak Mys:G52:ru3 3686081.063 >17 KDQGGYTMHQDQEGDTDAGGESPLQTPTEDGEEPGSETSDAK Mys:G52:ru3 64959501.42 下面是一个使用base R工具的解决方案,正如您在示例中使用的:

df 1 SKDGTGSDDKK Mys:G52:ru1 1512864.443 >2 SKDGTGSDDKK Mys:G52:ru2 64223793.8 >3 SKDGTGSDDKK Mys:G52:ru3 9767666215 >6 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531 >7 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531 >4 VQIINKKLDLSNVQSK Mys:G52:ru1 775987137.8 >5 VQIINKKLDLSNVQSK Mys:G52:ru2 1784901841 >13 SRLQTAPVPMPDLKNVKSK Mys:G52:ru1 317403612.9 >11 SRLQTAPVPMPDLKNVKSK Mys:G52:ru2 1442983324 >12 SRLQTAPVPMPDLKNVKSK Mys:G52:ru3 30553282.01 >8 SPSSAKSRLQTAPVPMPDLKNVK Mys:G52:ru1 39135868.44 >9 ESPLQTPTEDGSETSDAK Mys:G52:ru1 3612004.547 >10 CGSKDNIKHVPGGGSVQIVYKPVDLSK Mys:G52:ru1 16349475.63 >15 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru2 1612410048 >14 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru3 43701608 >16 kdqggytmhqdqegdtdalgesplqtptedgeseepgsetsdak Mys:G52:ru3 3686081.063 >17 KDQGGYTMHQDQEGDTDAGGESPLQTPTEDGEEPGSETSDAK Mys:G52:ru3 64959501.42 您需要使用nchar函数,但首先必须将df$string从factor转换为字符类型

以下是使用tidyverse工具的解决方案:

图书馆“tidyverse” df% 排列字符串、键 df2 >字符串键val >1 SKDGTGSDDK Mys:G52:ru1 1512864.443 >2 SKDGTGSDDKK Mys:G52:ru2 64223793.8 >3 SKDGTGSDDKK Mys:G52:ru3 9767666215 >4 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531 >5 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531 >6 VQIINKKLDLSNVQSK Mys:G52:ru1 775987137.8 >7 VQIINKKLDLSNVQSK Mys:G52:ru2 1784901841 >8 SRLQTAPVPMPDLKNVKSK Mys:G52:ru1 317403612.9 >9 SRLQTAPVPMPDLKNVKSK Mys:G52:ru2 1442983324 >10 SRLQTAPVPMPDLKNVKSK Mys:G52:ru3 30553282.01 >11 SPSSAKSRLQTAPVPMPDLKNVK Mys:G52:ru1 39135868.44 >12 ESPLQTPTEDGSETSDAK Mys:G52:ru1 3612004.547 >13 CGSKDNIKHVPGGGSVQIVYKPVDLSK Mys:G52:ru1 16349475.63 >14 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru2 1612410048 > 15 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru3 43701608 >16 kdqggytmhqdqegdtdalgesplqtptedgeseepgsetsdak Mys:G52:ru3 3686081.063 >17 KDQGGYTMHQDQEGDTDAGGESPLQTPTEDGEEPGSETSDAK Mys:G52:ru3 64959501.42 下面是一个使用base R工具的解决方案,正如您在示例中使用的:

df 1 SKDGTGSDDKK Mys:G52:ru1 1512864.443 >2 SKDGTGSDDKK Mys:G52:ru2 64223793.8 >3 SKDGTGSDDKK Mys:G52:ru3 9767666215 >6 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531 >7 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531 >4 VQIINKKLDLSNVQSK Mys:G52:ru1 775987137.8 >5 VQIINKKLDLSNVQSK Mys:G52:ru2 1784901841 >13 SRLQTAPVPMPDLKNVKSK Mys:G52:ru1 317403612.9 >11 SRLQTAPVPMPDLKNVKSK Mys:G52:ru2 1442983324 >12 SRLQTAPVPMPDLKNVKSK Mys:G52:ru3 30553282.01 >8 SPSSAKSRLQTAPVPMPDLKNVK Mys:G52:ru1 39135868.44 >9 ESPLQTPTEDGSETSDAK Mys:G52:ru1 3612004.547 >10 CGSKDNIKHVPGGGSVQIVYKPVDLSK Mys:G52:ru1 16349475.63 >15 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru2 1612410048 >14 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru3 43701608 >16 kdqggytmhqdqegdtdalgesplqtptedgeseepgsetsdak Mys:G52:ru3 3686081.063 >17 KDQGGYTMHQDQEGDTDAGGESPLQTPTEDGEEPGSETSDAK Mys:G52:ru3 64959501.42
df[orderncharas.characterdf$string,df$key,]这使您领先了一步。但问题是,若在两个连续的组字符串中同时包含ru1和ru3,会怎么样。你是如何处理的?df[orderncharas.characterdf$string,df$key,]这让你领先了一步。但问题是,若在两个连续的组字符串中同时包含ru1和ru3,会怎么样。如何处理呢?在实际数据中,我得到了类似这样的结果,例如,一个listdf`sam1\,\Area G10`或另一个`sam3\,\n\Area G73`。您知道如何解决这个问题吗?不确定,在列表的每个位置都有三个单独的值是正常的。这就是你看到的,下一行很精彩。。。提取最后一个元素。不幸的是,我没有看到像ru1,ru2等。我还是喜欢你的答案。感谢您在一个真实的数据中,我得到了类似这样的东西,例如一个列表df`sam1\,\Area G10`或另一个列表`sam3\,\n\Area G73`。您知道如何解决这个问题吗?不确定,在列表的每个位置都有三个单独的值是正常的。这就是你看到的,下一行很精彩。。。提取最后一个元素。不幸的是,我没有看到像ru1,ru2等。我还是喜欢你的答案。非常感谢。
                                     string           key         val
1                                   SKDGTGSDDKK Mys: G52: ru1 1512864.443
2                                   SKDGTGSDDKK Mys: G52: ru2  64223793.8
3                                   SKDGTGSDDKK Mys: G52: ru3  9767666215
6                                SRLQTAPVPMPDLK Mys: G52: ru1  1451319531
7                                SRLQTAPVPMPDLK Mys: G52: ru1  1451319531
4                              VQIINKKLDLSNVQSK Mys: G52: ru1 775987137.8
5                              VQIINKKLDLSNVQSK Mys: G52: ru2  1784901841
13                          SRLQTAPVPMPDLKNVKSK Mys: G52: ru1 317403612.9
11                          SRLQTAPVPMPDLKNVKSK Mys: G52: ru2  1442983324
12                          SRLQTAPVPMPDLKNVKSK Mys: G52: ru3 30553282.01
8                       SPSSAKSRLQTAPVPMPDLKNVK Mys: G52: ru1 39135868.44
9                       ESPLQTPTEDGSEEPGSETSDAK Mys: G52: ru1 3612004.547
10                  CGSKDNIKHVPGGGSVQIVYKPVDLSK Mys: G52: ru1 16349475.63
15                  SRLQTAPVPMPDLKNVKSKIGSTENLK Mys: G52: ru2  1612410048
14                  SRLQTAPVPMPDLKNVKSKIGSTENLK Mys: G52: ru3    43701608
16 KDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPGSETSDAK Mys: G52: ru3 3686081.063
17 KDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPGSETSDAK Mys: G52: ru3 64959501.42
library(dplyr)
df_tmp <- data.frame(names=df$string,chr=apply(df,2,nchar)[,1])
colnames(df_tmp)[1] <- "string"
df2 <- inner_join(df, df_tmp)
df2 <- df2[order(df2$chr, df2$key), ]
     string           key         val chr
     SKDGTGSDDKK Mys: G52: ru1 1512864.443  11
     SKDGTGSDDKK Mys: G52: ru1 1512864.443  11
     SKDGTGSDDKK Mys: G52: ru1 1512864.443  11
     SKDGTGSDDKK Mys: G52: ru2  64223793.8  11
    SKDGTGSDDKK Mys: G52: ru2  64223793.8  11
   SKDGTGSDDKK Mys: G52: ru2  64223793.8  11
    SKDGTGSDDKK Mys: G52: ru3  9767666215  11
     SKDGTGSDDKK Mys: G52: ru3  9767666215  11
    SKDGTGSDDKK Mys: G52: ru3  9767666215  11
     SRLQTAPVPMPDLK Mys: G52: ru1  1451319531  14
     SRLQTAPVPMPDLK Mys: G52: ru1  1451319531  14
   RLQTAPVPMPDLK Mys: G52: ru1  1451319531  14
    SRLQTAPVPMPDLK Mys: G52: ru1  1451319531  14
library(tidyverse)

df_sorted <- df %>% 
    # get length of string
    mutate(length_string = map_dbl(as.character(string), nchar)) %>%
    # arrange first by number of characters, then string, then key
    arrange(length_string, string, key) %>%
    # remove length column
    select(-length_string)