R 如何基于两列进行排序,但将特定字符串保持在一起
我有一个数据如下。 我正试着把它们分类R 如何基于两列进行排序,但将特定字符串保持在一起,r,sorting,R,Sorting,我有一个数据如下。 我正试着把它们分类 df<-structure(list(string = structure(c(4L, 4L, 4L, 9L, 9L, 6L, 6L, 5L, 2L, 1L, 7L, 7L, 7L, 8L, 8L, 3L, 3L), .Label = c("CGSKDNIKHVPGGGSVQIVYKPVDLSK", "ESPLQTPTEDGSEEPGSETSDAK", "KDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPGSETSD
df<-structure(list(string = structure(c(4L, 4L, 4L, 9L, 9L, 6L, 6L,
5L, 2L, 1L, 7L, 7L, 7L, 8L, 8L, 3L, 3L), .Label = c("CGSKDNIKHVPGGGSVQIVYKPVDLSK",
"ESPLQTPTEDGSEEPGSETSDAK", "KDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPGSETSDAK",
"SKDGTGSDDKK", "SPSSAKSRLQTAPVPMPDLKNVK", "SRLQTAPVPMPDLK", "SRLQTAPVPMPDLKNVKSK",
"SRLQTAPVPMPDLKNVKSKIGSTENLK", "VQIINKKLDLSNVQSK"), class = "factor"),
key = structure(c(1L, 2L, 3L, 1L, 2L, 1L, 1L, 1L, 1L, 1L,
2L, 3L, 1L, 3L, 2L, 3L, 3L), .Label = c("Mys: G52: ru1",
"Mys: G52: ru2", "Mys: G52: ru3"), class = "factor"), val = structure(c(3L,
13L, 16L, 15L, 6L, 2L, 2L, 11L, 9L, 5L, 1L, 7L, 8L, 12L,
4L, 10L, 14L), .Label = c("1442983324", "1451319531", "1512864.443",
"1612410048", "16349475.63", "1784901841", "30553282.01",
"317403612.9", "3612004.547", "3686081.063", "39135868.44",
"43701608", "64223793.8", "64959501.42", "775987137.8", "9767666215"
), class = "factor")), .Names = c("string", "key", "val"), class = "data.frame", row.names = c(NA,
-17L))
半柱裂秒柱
listdf<-strsplit(as.character(df[,2]),split=":")
半柱裂秒柱
listdf<-strsplit(as.character(df[,2]),split=":")
您需要首先计算字符串的长度,然后根据该列进行排序。为此,我首先创建了一个新的数据帧df_tmp,然后将其合并到df2中 代码
您需要首先计算字符串的长度,然后根据该列进行排序。为此,我首先创建了一个新的数据帧df_tmp,然后将其合并到df2中 代码
尝试Hadley的tidyverse函数:
library(tidyverse)
df_sorted <- df %>%
# get length of string
mutate(length_string = map_dbl(as.character(string), nchar)) %>%
# arrange first by number of characters, then string, then key
arrange(length_string, string, key) %>%
# remove length column
select(-length_string)
尝试Hadley的tidyverse函数:
library(tidyverse)
df_sorted <- df %>%
# get length of string
mutate(length_string = map_dbl(as.character(string), nchar)) %>%
# arrange first by number of characters, then string, then key
arrange(length_string, string, key) %>%
# remove length column
select(-length_string)
您需要使用nchar函数,但首先必须将df$string从factor转换为字符类型
以下是使用tidyverse工具的解决方案:
图书馆“tidyverse”
df%
排列字符串、键
df2
>字符串键val
>1 SKDGTGSDDK Mys:G52:ru1 1512864.443
>2 SKDGTGSDDKK Mys:G52:ru2 64223793.8
>3 SKDGTGSDDKK Mys:G52:ru3 9767666215
>4 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531
>5 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531
>6 VQIINKKLDLSNVQSK Mys:G52:ru1 775987137.8
>7 VQIINKKLDLSNVQSK Mys:G52:ru2 1784901841
>8 SRLQTAPVPMPDLKNVKSK Mys:G52:ru1 317403612.9
>9 SRLQTAPVPMPDLKNVKSK Mys:G52:ru2 1442983324
>10 SRLQTAPVPMPDLKNVKSK Mys:G52:ru3 30553282.01
>11 SPSSAKSRLQTAPVPMPDLKNVK Mys:G52:ru1 39135868.44
>12 ESPLQTPTEDGSETSDAK Mys:G52:ru1 3612004.547
>13 CGSKDNIKHVPGGGSVQIVYKPVDLSK Mys:G52:ru1 16349475.63
>14 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru2 1612410048
>15 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru3 43701608
>16 kdqggytmhqdqegdtdalgesplqtptedgeseepgsetsdak Mys:G52:ru3 3686081.063
>17 KDQGGYTMHQDQEGDTDAGGESPLQTPTEDGEEPGSETSDAK Mys:G52:ru3 64959501.42
下面是一个使用base R工具的解决方案,正如您在示例中使用的:
df 1 SKDGTGSDDKK Mys:G52:ru1 1512864.443
>2 SKDGTGSDDKK Mys:G52:ru2 64223793.8
>3 SKDGTGSDDKK Mys:G52:ru3 9767666215
>6 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531
>7 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531
>4 VQIINKKLDLSNVQSK Mys:G52:ru1 775987137.8
>5 VQIINKKLDLSNVQSK Mys:G52:ru2 1784901841
>13 SRLQTAPVPMPDLKNVKSK Mys:G52:ru1 317403612.9
>11 SRLQTAPVPMPDLKNVKSK Mys:G52:ru2 1442983324
>12 SRLQTAPVPMPDLKNVKSK Mys:G52:ru3 30553282.01
>8 SPSSAKSRLQTAPVPMPDLKNVK Mys:G52:ru1 39135868.44
>9 ESPLQTPTEDGSETSDAK Mys:G52:ru1 3612004.547
>10 CGSKDNIKHVPGGGSVQIVYKPVDLSK Mys:G52:ru1 16349475.63
>15 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru2 1612410048
>14 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru3 43701608
>16 kdqggytmhqdqegdtdalgesplqtptedgeseepgsetsdak Mys:G52:ru3 3686081.063
>17 KDQGGYTMHQDQEGDTDAGGESPLQTPTEDGEEPGSETSDAK Mys:G52:ru3 64959501.42
您需要使用nchar函数,但首先必须将df$string从factor转换为字符类型
以下是使用tidyverse工具的解决方案:
图书馆“tidyverse”
df%
排列字符串、键
df2
>字符串键val
>1 SKDGTGSDDK Mys:G52:ru1 1512864.443
>2 SKDGTGSDDKK Mys:G52:ru2 64223793.8
>3 SKDGTGSDDKK Mys:G52:ru3 9767666215
>4 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531
>5 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531
>6 VQIINKKLDLSNVQSK Mys:G52:ru1 775987137.8
>7 VQIINKKLDLSNVQSK Mys:G52:ru2 1784901841
>8 SRLQTAPVPMPDLKNVKSK Mys:G52:ru1 317403612.9
>9 SRLQTAPVPMPDLKNVKSK Mys:G52:ru2 1442983324
>10 SRLQTAPVPMPDLKNVKSK Mys:G52:ru3 30553282.01
>11 SPSSAKSRLQTAPVPMPDLKNVK Mys:G52:ru1 39135868.44
>12 ESPLQTPTEDGSETSDAK Mys:G52:ru1 3612004.547
>13 CGSKDNIKHVPGGGSVQIVYKPVDLSK Mys:G52:ru1 16349475.63
>14 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru2 1612410048
> 15
SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru3 43701608
>16 kdqggytmhqdqegdtdalgesplqtptedgeseepgsetsdak Mys:G52:ru3 3686081.063
>17 KDQGGYTMHQDQEGDTDAGGESPLQTPTEDGEEPGSETSDAK Mys:G52:ru3 64959501.42
下面是一个使用base R工具的解决方案,正如您在示例中使用的:
df 1 SKDGTGSDDKK Mys:G52:ru1 1512864.443
>2 SKDGTGSDDKK Mys:G52:ru2 64223793.8
>3 SKDGTGSDDKK Mys:G52:ru3 9767666215
>6 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531
>7 SRLQTAPVPMPDLK Mys:G52:ru1 1451319531
>4 VQIINKKLDLSNVQSK Mys:G52:ru1 775987137.8
>5 VQIINKKLDLSNVQSK Mys:G52:ru2 1784901841
>13 SRLQTAPVPMPDLKNVKSK Mys:G52:ru1 317403612.9
>11 SRLQTAPVPMPDLKNVKSK Mys:G52:ru2 1442983324
>12 SRLQTAPVPMPDLKNVKSK Mys:G52:ru3 30553282.01
>8 SPSSAKSRLQTAPVPMPDLKNVK Mys:G52:ru1 39135868.44
>9 ESPLQTPTEDGSETSDAK Mys:G52:ru1 3612004.547
>10 CGSKDNIKHVPGGGSVQIVYKPVDLSK Mys:G52:ru1 16349475.63
>15 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru2 1612410048
>14 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys:G52:ru3 43701608
>16 kdqggytmhqdqegdtdalgesplqtptedgeseepgsetsdak Mys:G52:ru3 3686081.063
>17 KDQGGYTMHQDQEGDTDAGGESPLQTPTEDGEEPGSETSDAK Mys:G52:ru3 64959501.42
df[orderncharas.characterdf$string,df$key,]这使您领先了一步。但问题是,若在两个连续的组字符串中同时包含ru1和ru3,会怎么样。你是如何处理的?df[orderncharas.characterdf$string,df$key,]这让你领先了一步。但问题是,若在两个连续的组字符串中同时包含ru1和ru3,会怎么样。如何处理呢?在实际数据中,我得到了类似这样的结果,例如,一个listdf`sam1\,\Area G10`或另一个`sam3\,\n\Area G73`。您知道如何解决这个问题吗?不确定,在列表的每个位置都有三个单独的值是正常的。这就是你看到的,下一行很精彩。。。提取最后一个元素。不幸的是,我没有看到像ru1,ru2等。我还是喜欢你的答案。感谢您在一个真实的数据中,我得到了类似这样的东西,例如一个列表df`sam1\,\Area G10`或另一个列表`sam3\,\n\Area G73`。您知道如何解决这个问题吗?不确定,在列表的每个位置都有三个单独的值是正常的。这就是你看到的,下一行很精彩。。。提取最后一个元素。不幸的是,我没有看到像ru1,ru2等。我还是喜欢你的答案。非常感谢。
string key val
1 SKDGTGSDDKK Mys: G52: ru1 1512864.443
2 SKDGTGSDDKK Mys: G52: ru2 64223793.8
3 SKDGTGSDDKK Mys: G52: ru3 9767666215
6 SRLQTAPVPMPDLK Mys: G52: ru1 1451319531
7 SRLQTAPVPMPDLK Mys: G52: ru1 1451319531
4 VQIINKKLDLSNVQSK Mys: G52: ru1 775987137.8
5 VQIINKKLDLSNVQSK Mys: G52: ru2 1784901841
13 SRLQTAPVPMPDLKNVKSK Mys: G52: ru1 317403612.9
11 SRLQTAPVPMPDLKNVKSK Mys: G52: ru2 1442983324
12 SRLQTAPVPMPDLKNVKSK Mys: G52: ru3 30553282.01
8 SPSSAKSRLQTAPVPMPDLKNVK Mys: G52: ru1 39135868.44
9 ESPLQTPTEDGSEEPGSETSDAK Mys: G52: ru1 3612004.547
10 CGSKDNIKHVPGGGSVQIVYKPVDLSK Mys: G52: ru1 16349475.63
15 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys: G52: ru2 1612410048
14 SRLQTAPVPMPDLKNVKSKIGSTENLK Mys: G52: ru3 43701608
16 KDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPGSETSDAK Mys: G52: ru3 3686081.063
17 KDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPGSETSDAK Mys: G52: ru3 64959501.42
library(dplyr)
df_tmp <- data.frame(names=df$string,chr=apply(df,2,nchar)[,1])
colnames(df_tmp)[1] <- "string"
df2 <- inner_join(df, df_tmp)
df2 <- df2[order(df2$chr, df2$key), ]
string key val chr
SKDGTGSDDKK Mys: G52: ru1 1512864.443 11
SKDGTGSDDKK Mys: G52: ru1 1512864.443 11
SKDGTGSDDKK Mys: G52: ru1 1512864.443 11
SKDGTGSDDKK Mys: G52: ru2 64223793.8 11
SKDGTGSDDKK Mys: G52: ru2 64223793.8 11
SKDGTGSDDKK Mys: G52: ru2 64223793.8 11
SKDGTGSDDKK Mys: G52: ru3 9767666215 11
SKDGTGSDDKK Mys: G52: ru3 9767666215 11
SKDGTGSDDKK Mys: G52: ru3 9767666215 11
SRLQTAPVPMPDLK Mys: G52: ru1 1451319531 14
SRLQTAPVPMPDLK Mys: G52: ru1 1451319531 14
RLQTAPVPMPDLK Mys: G52: ru1 1451319531 14
SRLQTAPVPMPDLK Mys: G52: ru1 1451319531 14
library(tidyverse)
df_sorted <- df %>%
# get length of string
mutate(length_string = map_dbl(as.character(string), nchar)) %>%
# arrange first by number of characters, then string, then key
arrange(length_string, string, key) %>%
# remove length column
select(-length_string)