R 在数据帧中添加缺少的索引_R_Dataframe_Dplyr_Tidyr

R 在数据帧中添加缺少的索引

r dataframe

R 在数据帧中添加缺少的索引,r,dataframe,dplyr,tidyr,R,Dataframe,Dplyr,Tidyr,嗨，我有一个混乱的数据框，如下所示： df <- data.frame(age.band = c("0-5","5-10"), beg.code = c("A1","B1"), end.code=c("A5","B3"),value = c(10,5)) age.band beg.code end.code value 0-5 A1 A5 10 5-10 B1 B3 5 有人能帮我找到一种方法来为这个数据

嗨，我有一个混乱的数据框，如下所示：

df <- data.frame(age.band = c("0-5","5-10"), beg.code = c("A1","B1"), end.code=c("A5","B3"),value = c(10,5))

age.band beg.code end.code  value
   0-5      A1      A5        10
   5-10     B1      B3         5

有人能帮我找到一种方法来为这个数据帧添加所有缺失的索引吗？感谢使用和的解决方案。请注意，我添加了stringsAsFactors=FALSE，以避免在创建示例数据框时创建因子列。如果在原始数据帧上运行代码，您将收到因“因子”列而产生的警告消息，但不会影响最终结果

library(dplyr)
library(tidyr)

df2 <- df %>%
  gather(Code, Value, ends_with("code")) %>%
  extract(Value, into = c("Group", "Index"), regex = "([A-Za-z+].*)([\\d].*$)",
          convert = TRUE) %>%
  select(-Code) %>%
  group_by(Group) %>%
  complete(Index = full_seq(Index, period = 1)) %>%
  unite(Index, c("Group", "Index"), sep = "") %>%
  fill(-Index)
df2
# # A tibble: 8 x 3
#   Index age.band value
# * <chr>    <chr> <dbl>
# 1    A1      0-5    10
# 2    A2      0-5    10
# 3    A3      0-5    10
# 4    A4      0-5    10
# 5    A5      0-5    10
# 6    B1     5-10     5
# 7    B2     5-10     5
# 8    B3     5-10     5

资料

使用和的解决方案。请注意，我添加了stringsAsFactors=FALSE，以避免在创建示例数据框时创建因子列。如果在原始数据帧上运行代码，您将收到因“因子”列而产生的警告消息，但不会影响最终结果

library(dplyr)
library(tidyr)

df2 <- df %>%
  gather(Code, Value, ends_with("code")) %>%
  extract(Value, into = c("Group", "Index"), regex = "([A-Za-z+].*)([\\d].*$)",
          convert = TRUE) %>%
  select(-Code) %>%
  group_by(Group) %>%
  complete(Index = full_seq(Index, period = 1)) %>%
  unite(Index, c("Group", "Index"), sep = "") %>%
  fill(-Index)
df2
# # A tibble: 8 x 3
#   Index age.band value
# * <chr>    <chr> <dbl>
# 1    A1      0-5    10
# 2    A2      0-5    10
# 3    A3      0-5    10
# 4    A4      0-5    10
# 5    A5      0-5    10
# 6    B1     5-10     5
# 7    B2     5-10     5
# 8    B3     5-10     5

资料

这里有一个以R为基数的选项。其想法是从“代码”列中删除非数字字符，将其转换为数字，并将序列存储为列表。然后，粘贴非数字字符，最后，根据列表的长度，使用rep扩展原始数据集的行，并通过取消列表创建新列“index”

df <- data.frame(age.band = c("0-5","5-10"), beg.code = c("A1","B1"), end.code=c("A5","B3"),value = c(10,5),
                 stringsAsFactors = FALSE)

lst <- do.call(Map, c(f = `:`, lapply(df[2:3], function(x) as.numeric(sub("\\D+", "", x)))))
lst1 <- Map(paste0, substr(df[,2], 1, 1), lst)
data.frame(index = unlist(lst1), df[rep(seq_len(nrow(df)), lengths(lst1)), -(2:3)])