R 动态生成数据帧值列名称_R_Dplyr_Purrr

R 动态生成数据帧值列名称

R 动态生成数据帧值列名称,r,dplyr,purrr,R,Dplyr,Purrr,我试图在列中获取一个值，将其设置为列名。冒号前的字符应为列名 df = cbind.data.frame( id = c(1, 2 ,3, 4, 5), characteristics_ch1 = c("gender: Female", "gender: Male", "gender: Female", "gender: Male", "gender: Female"), characteristics_ch1.1 = c("Thing One: a", "Thing O

我试图在列中获取一个值，将其设置为列名。冒号前的字符应为列名

df = cbind.data.frame(
    id = c(1, 2 ,3, 4, 5),
    characteristics_ch1 = c("gender: Female", "gender: Male", "gender: Female", "gender: Male", "gender: Female"),
    characteristics_ch1.1 = c("Thing One: a", "Thing One: a", "Thing One: a", "Thing One: b", "Thing One: b"),
    characteristics_ch1.2 = c("age: 60", "age: 45", "age: 63", "age: 56", "age: 65"))

对于第2-5列，我想删除“性别：”、“第一件事：”、“年龄：”，使它们成为各自列的名称

生成的数据帧将是：

Result = cbind.data.frame(
        id = c(1, 2 ,3, 4, 5),
        gender = c("Female", "Male", "Female", "Male", "Female"),
        `Thing One` = c("a", "a", "a", "b", "b"),
        age = c("60", "45", "63", "56", "65")
)

为此，我运行以下函数：

re_col = function(i){
        new_name = str_split_fixed(i, ": ", 2)[1]
        return(assign(new_name, str_split_fixed(i, ": ", 2)[,2]))
}

通过以下应用功能：

plyr::colwise(re_col)(df)

#and

purrr::map(df, re_col)

没有成功

还有更好的办法。我最初尝试编写一个函数，该函数可以作为%>%步骤与dplyr一起用于数据清理，但没有成功。

一种解决方法，使用

stringi

通过提供给指定列的正则表达式模式拆分数据值

rename.df_cols <- function(df, rgx_pattern = NULL, col_idx = NULL,...){
    if(max(col_idx) > ncol(df)){
        col_idx <- min(col_idx):ncol(df)
    }
    o <- lapply(col_idx, function(i){
    
        parts <- stri_split_regex(df[[i]], rgx_pattern, simplify = T)
        col_name <- unique(parts[,1])
        new_dat <- parts[,2]
        
        colnames(df)[[i]] <<- col_name
        df[[i]] <<- new_dat
    })
    return(df)
}

> df
  id characteristics_ch1 characteristics_ch1.1 characteristics_ch1.2
1  1      gender: Female          Thing One: a               age: 60
2  2        gender: Male          Thing One: a               age: 45
3  3      gender: Female          Thing One: a               age: 63
4  4        gender: Male          Thing One: b               age: 56
5  5      gender: Female          Thing One: b               age: 65
> rename.df_cols(df = df, col_idx = 2:4, rgx_pattern = "(\\s+)?\\:(\\s+)?")
  id gender Thing One age
1  1 Female         a  60
2  2   Male         a  45
3  3 Female         a  63
4  4   Male         b  56
5  5 Female         b  65

一种解决方法，使用

stringi

通过提供给指定列的正则表达式模式分割数据值

rename.df_cols <- function(df, rgx_pattern = NULL, col_idx = NULL,...){
    if(max(col_idx) > ncol(df)){
        col_idx <- min(col_idx):ncol(df)
    }
    o <- lapply(col_idx, function(i){
    
        parts <- stri_split_regex(df[[i]], rgx_pattern, simplify = T)
        col_name <- unique(parts[,1])
        new_dat <- parts[,2]
        
        colnames(df)[[i]] <<- col_name
        df[[i]] <<- new_dat
    })
    return(df)
}

> df
  id characteristics_ch1 characteristics_ch1.1 characteristics_ch1.2
1  1      gender: Female          Thing One: a               age: 60
2  2        gender: Male          Thing One: a               age: 45
3  3      gender: Female          Thing One: a               age: 63
4  4        gender: Male          Thing One: b               age: 56
5  5      gender: Female          Thing One: b               age: 65
> rename.df_cols(df = df, col_idx = 2:4, rgx_pattern = "(\\s+)?\\:(\\s+)?")
  id gender Thing One age
1  1 Female         a  60
2  2   Male         a  45
3  3 Female         a  63
4  4   Male         b  56
5  5 Female         b  65

我们可以

将

数据帧收集为长格式，

通过：
分离

值列，然后

将

数据帧扩展为宽格式

library(tidyverse)

df2 <- df %>%
  gather(Column, Value, -id) %>%
  separate(Value, into = c("New_Column", "Value"), sep = ": ") %>%
  select(-Column) %>%
  spread(New_Column, Value, convert = TRUE)
df2
#   id age gender Thing One
# 1  1  60 Female         a
# 2  2  45   Male         a
# 3  3  63 Female         a
# 4  4  56   Male         b
# 5  5  65 Female         b

库（tidyverse）
df2%
聚集（列，值，-id）%%>%
分离（值，分为=c（“新_列”，“值”），sep=“：”）%>%
选择（-Column）%>%
排列（新列，值，转换=TRUE）
df2
#身份证年龄性别第一件事
#160名女性a
#245男a
#3 63女a
#4 56男b
#565女性b

我们可以

将数据帧收集成长格式，通过：
分离
值列，然后将
数据帧扩展回宽格式
library(tidyverse)

df2 <- df %>%
  gather(Column, Value, -id) %>%
  separate(Value, into = c("New_Column", "Value"), sep = ": ") %>%
  select(-Column) %>%
  spread(New_Column, Value, convert = TRUE)
df2
#   id age gender Thing One
# 1  1  60 Female         a
# 2  2  45   Male         a
# 3  3  63 Female         a
# 4  4  56   Male         b
# 5  5  65 Female         b

库（tidyverse）
df2%
聚集（列，值，-id）%%>%
分离（值，分为=c（“新_列”，“值”），sep=“：”）%>%
选择（-Column）%>%
排列（新列，值，转换=TRUE）
df2
#身份证年龄性别第一件事
#160名女性a
#245男a
#3 63女a
#4 56男b
#565女性b
太棒了！这是一个非常简单的解决方案，但愿我能想到它！令人惊叹的！这是一个非常简单的解决方案，但愿我能想到它！