R：从一列中提取数据以创建新列_R_Dataframe

R：从一列中提取数据以创建新列

r dataframe

R：从一列中提取数据以创建新列,r,dataframe,R,Dataframe,我有需要解包并创建到新列中的带有示例名称的数据 sample P10.1 P11.2 S1.1 S3.3 使用示例ID数据，我需要创建三个新列：组织、植物、阶段 sample tissue plant stage P10.1 P 10 1 P11.2 P 11 2 S1.1 S 1 1 S3.3 S 3 3 是否有方法从样本列中提取数据以填充三个新列？使用dplyr和tidyr 首先我们在示例代码中插入一个“.”

我有需要解包并创建到新列中的带有示例名称的数据

sample
P10.1
P11.2
S1.1
S3.3

使用示例ID数据，我需要创建三个新列：组织、植物、阶段

sample tissue plant stage
P10.1  P      10    1
P11.2  P      11    2
S1.1   S      1     1
S3.3   S      3     3

是否有方法从样本列中提取数据以填充三个新列？

使用

dplyr

和

tidyr

首先我们在示例代码中插入一个“.”，然后我们将示例分成3列

library(dplyr)
library(tidyr)

df %>% 
  mutate(sample = paste0(substring(df$sample, 1, 1), ".", substring(df$sample, 2))) %>% 
  separate(sample, into = c("tissue", "plant", "stage"), remove = FALSE)

  sample tissue plant stage
1 P.10.1      P    10     1
2 P.11.2      P    11     2
3  S.1.1      S     1     1
4  S.3.3      S     3     3

数据：

df类似于@phiver，但使用正则表达式
在模式中
：

第一个括号捕获任何一个大写字母（用于纸巾
）
第二个括号包含任何一位或两位数字（对于工厂）

第三个括号包含任何一位或两位数字（对于阶段
）

sub（）
函数提取那些捕获组，然后放入新变量中
library(magrittr)
pattern <- "^([A-Z])(\\d{1,2})\\.(\\d{1,2})$"
df %>% 
  dplyr::mutate(
    tissue   = sub(pattern, "\\1", sample),
    plant    = as.integer(sub(pattern, "\\2", sample)),
    stage    = as.integer(sub(pattern, "\\3", sample))
  )

这与phiver的类似，但使用separate
两次。请注意，我们可以在sep
参数中指定位置索引
library(tidyr)

dat2 <- dat %>%
  separate(sample, into = c("tissue", "number"), sep = 1, remove = FALSE) %>%
  separate(number, into = c("plant", "stage"), sep = "\\.", remove = TRUE, convert = TRUE)
dat2
#   sample tissue plant stage
# 1  P10.1      P    10     1
# 2  P11.2      P    11     2
# 3   S1.1      S     1     1
# 4   S3.3      S     3     3

library（tidyr）
dat2%
分离（样本，分为=c（“组织”，“数量”），sep=1，去除=假）%>%
分离（编号，输入=c（“工厂”、“阶段”），sep=“\\”，remove=TRUE，convert=TRUE）
dat2
#样本组织植株期
#1 P10.1第10页1
#2 P11.2 P 11 2
#3 S1.1 S 1
#4 S3.3 S 3

数据
dat <- read.table(text = "sample
P10.1
P11.2
S1.1
S3.3",
                  header = TRUE, stringsAsFactors = FALSE)

dat
library(tidyr)

dat2 <- dat %>%
  separate(sample, into = c("tissue", "number"), sep = 1, remove = FALSE) %>%
  separate(number, into = c("plant", "stage"), sep = "\\.", remove = TRUE, convert = TRUE)
dat2
#   sample tissue plant stage
# 1  P10.1      P    10     1
# 2  P11.2      P    11     2
# 3   S1.1      S     1     1
# 4   S3.3      S     3     3

dat <- read.table(text = "sample
P10.1
P11.2
S1.1
S3.3",
                  header = TRUE, stringsAsFactors = FALSE)