R 在多个列上收集或透视数据?

R 在多个列上收集或透视数据?,r,R,这是我掌握的数据: id<-c(1:7) emp1<-c('ft','ft','pt','pt','ft','no','no') emp2<-c('ft','ft','ft','ft','no','pt','ft') marstat1<-c('married','married','divorced','single','single','single','single') marstat2<-c('divorced','married','divorced','s

这是我掌握的数据:

id<-c(1:7)
emp1<-c('ft','ft','pt','pt','ft','no','no')
emp2<-c('ft','ft','ft','ft','no','pt','ft')
marstat1<-c('married','married','divorced','single','single','single','single')
marstat2<-c('divorced','married','divorced','single','single','married','single')
df<-data.frame(id,emp1,emp2,marstat1,marstat2) 
我想把它从宽改长

id<-c(1,1,2,2,3,3,4,4,5,5,6,6,7,7)
year<-c(rep(1:2,7))
emp<-c('ft','ft','ft','ft','pt','ft','pt','ft','ft','no','no','pt','no','ft')
marstat<-c('married','divorced','married','married','divorced','divorced','single','single','single','single','single','married','single','single')
df2<-data.frame(id,year,emp,marstat)
我试图使用dplyr::gather,但它给了我4行每个id,而不是2行。我不清楚如何处理年份维度,我不需要两列年份,只有一列

df2<-df %>% 
  gather(key='year',value='emp',emp1,emp2) %>% 
  gather(key='year2',value='marstat',marstat1,marstat2)

也许这对你有用:

library(tidyverse)
#Code
newdf <- df %>% select(c('id',starts_with('emp'))) %>%
  pivot_longer(-id) %>%
  mutate(name=gsub('emp','',name)) %>%
  rename(emp=value,year=name) %>%
  left_join(
    df %>% select(c('id',starts_with('marstat'))) %>%
      pivot_longer(-id) %>%
      mutate(name=gsub('marstat','',name)) %>%
      rename(marstat=value,year=name)
  )
输出:

# A tibble: 14 x 4
      id year  emp   marstat 
   <int> <chr> <fct> <fct>   
 1     1 1     ft    married 
 2     1 2     ft    divorced
 3     2 1     ft    married 
 4     2 2     ft    married 
 5     3 1     pt    divorced
 6     3 2     ft    divorced
 7     4 1     pt    single  
 8     4 2     ft    single  
 9     5 1     ft    single  
10     5 2     no    single  
11     6 1     no    single  
12     6 2     pt    married 
13     7 1     no    single  
14     7 2     ft    single  

也许这对你有用:

library(tidyverse)
#Code
newdf <- df %>% select(c('id',starts_with('emp'))) %>%
  pivot_longer(-id) %>%
  mutate(name=gsub('emp','',name)) %>%
  rename(emp=value,year=name) %>%
  left_join(
    df %>% select(c('id',starts_with('marstat'))) %>%
      pivot_longer(-id) %>%
      mutate(name=gsub('marstat','',name)) %>%
      rename(marstat=value,year=name)
  )
输出:

# A tibble: 14 x 4
      id year  emp   marstat 
   <int> <chr> <fct> <fct>   
 1     1 1     ft    married 
 2     1 2     ft    divorced
 3     2 1     ft    married 
 4     2 2     ft    married 
 5     3 1     pt    divorced
 6     3 2     ft    divorced
 7     4 1     pt    single  
 8     4 2     ft    single  
 9     5 1     ft    single  
10     5 2     no    single  
11     6 1     no    single  
12     6 2     pt    married 
13     7 1     no    single  
14     7 2     ft    single  

这不是最优雅的解决方案,但它可以:

pivot_longer(df, cols=starts_with("marstat"), names_to="year", names_prefix="marstat", values_to="marstat") %>% 
pivot_longer(cols=starts_with("emp"), names_to="empyear", names_prefix="emp", values_to="emp") %>% 
filter(year==empyear) %>% 
select(-empyear)
这样做的目的是两个连续的轴,然后删除年份编号不相同的行,最后删除多余的年份列。请注意,names_prefix允许您删除marstat1/2和emp1/2的文本部分,但您仍然会得到一个chr向量,如果需要,您需要将其转换为整数

给你这个:

A tibble: 14 x 4
      id year  marstat  emp  
   <int> <chr> <fct>    <fct>

 1     1 1     married  ft   
 2     1 2     divorced ft   
 3     2 1     married  ft   
 4     2 2     married  ft   
 5     3 1     divorced pt   
 6     3 2     divorced ft   
 7     4 1     single   pt   
 8     4 2     single   ft   
 9     5 1     single   ft   
10     5 2     single   no   
11     6 1     single   no   
12     6 2     married  pt   
13     7 1     single   no   
14     7 2     single   ft 

这不是最优雅的解决方案,但它可以:

pivot_longer(df, cols=starts_with("marstat"), names_to="year", names_prefix="marstat", values_to="marstat") %>% 
pivot_longer(cols=starts_with("emp"), names_to="empyear", names_prefix="emp", values_to="emp") %>% 
filter(year==empyear) %>% 
select(-empyear)
这样做的目的是两个连续的轴,然后删除年份编号不相同的行,最后删除多余的年份列。请注意,names_prefix允许您删除marstat1/2和emp1/2的文本部分,但您仍然会得到一个chr向量,如果需要,您需要将其转换为整数

给你这个:

A tibble: 14 x 4
      id year  marstat  emp  
   <int> <chr> <fct>    <fct>

 1     1 1     married  ft   
 2     1 2     divorced ft   
 3     2 1     married  ft   
 4     2 2     married  ft   
 5     3 1     divorced pt   
 6     3 2     divorced ft   
 7     4 1     single   pt   
 8     4 2     single   ft   
 9     5 1     single   ft   
10     5 2     single   no   
11     6 1     single   no   
12     6 2     married  pt   
13     7 1     single   no   
14     7 2     single   ft 
以这种方法为例

df %>% 
  pivot_longer(matches("\\d$"), names_to = c("name", "year"), names_pattern = "([^\\d]+)(\\d+)$") %>% 
  pivot_wider()
首先,将数据帧转换为只有三列id、nameyear和value的数据帧;同时将第二列nameyear分隔为name和year。然后,只需将两列的名称和值旋转得更宽

输出

# A tibble: 14 x 4
      id year  emp   marstat 
   <int> <chr> <chr> <chr>   
 1     1 1     ft    married 
 2     1 2     ft    divorced
 3     2 1     ft    married 
 4     2 2     ft    married 
 5     3 1     pt    divorced
 6     3 2     ft    divorced
 7     4 1     pt    single  
 8     4 2     ft    single  
 9     5 1     ft    single  
10     5 2     no    single  
11     6 1     no    single  
12     6 2     pt    married 
13     7 1     no    single  
14     7 2     ft    single
以这种方法为例

df %>% 
  pivot_longer(matches("\\d$"), names_to = c("name", "year"), names_pattern = "([^\\d]+)(\\d+)$") %>% 
  pivot_wider()
首先,将数据帧转换为只有三列id、nameyear和value的数据帧;同时将第二列nameyear分隔为name和year。然后,只需将两列的名称和值旋转得更宽

输出

# A tibble: 14 x 4
      id year  emp   marstat 
   <int> <chr> <chr> <chr>   
 1     1 1     ft    married 
 2     1 2     ft    divorced
 3     2 1     ft    married 
 4     2 2     ft    married 
 5     3 1     pt    divorced
 6     3 2     ft    divorced
 7     4 1     pt    single  
 8     4 2     ft    single  
 9     5 1     ft    single  
10     5 2     no    single  
11     6 1     no    single  
12     6 2     pt    married 
13     7 1     no    single  
14     7 2     ft    single

美丽的我会记下来的。非常有用!谢谢非常感谢。成功了!我不明白这些部分:\\d$,[^\\d]+\\d+$…但我猜这是基于查找数字或字符串的基本R语法?不,这些是正则表达式。有关更多信息,请参阅@真漂亮!我会记下来的。非常有用!谢谢非常感谢。成功了!我不明白这些部分:\\d$,[^\\d]+\\d+$…但我猜这是基于查找数字或字符串的基本R语法?不,这些是正则表达式。有关更多信息,请参阅@kna123