R 在多个列上收集或透视数据?
这是我掌握的数据:R 在多个列上收集或透视数据?,r,R,这是我掌握的数据: id<-c(1:7) emp1<-c('ft','ft','pt','pt','ft','no','no') emp2<-c('ft','ft','ft','ft','no','pt','ft') marstat1<-c('married','married','divorced','single','single','single','single') marstat2<-c('divorced','married','divorced','s
id<-c(1:7)
emp1<-c('ft','ft','pt','pt','ft','no','no')
emp2<-c('ft','ft','ft','ft','no','pt','ft')
marstat1<-c('married','married','divorced','single','single','single','single')
marstat2<-c('divorced','married','divorced','single','single','married','single')
df<-data.frame(id,emp1,emp2,marstat1,marstat2)
我想把它从宽改长
id<-c(1,1,2,2,3,3,4,4,5,5,6,6,7,7)
year<-c(rep(1:2,7))
emp<-c('ft','ft','ft','ft','pt','ft','pt','ft','ft','no','no','pt','no','ft')
marstat<-c('married','divorced','married','married','divorced','divorced','single','single','single','single','single','married','single','single')
df2<-data.frame(id,year,emp,marstat)
我试图使用dplyr::gather,但它给了我4行每个id,而不是2行。我不清楚如何处理年份维度,我不需要两列年份,只有一列
df2<-df %>%
gather(key='year',value='emp',emp1,emp2) %>%
gather(key='year2',value='marstat',marstat1,marstat2)
也许这对你有用:
library(tidyverse)
#Code
newdf <- df %>% select(c('id',starts_with('emp'))) %>%
pivot_longer(-id) %>%
mutate(name=gsub('emp','',name)) %>%
rename(emp=value,year=name) %>%
left_join(
df %>% select(c('id',starts_with('marstat'))) %>%
pivot_longer(-id) %>%
mutate(name=gsub('marstat','',name)) %>%
rename(marstat=value,year=name)
)
输出:
# A tibble: 14 x 4
id year emp marstat
<int> <chr> <fct> <fct>
1 1 1 ft married
2 1 2 ft divorced
3 2 1 ft married
4 2 2 ft married
5 3 1 pt divorced
6 3 2 ft divorced
7 4 1 pt single
8 4 2 ft single
9 5 1 ft single
10 5 2 no single
11 6 1 no single
12 6 2 pt married
13 7 1 no single
14 7 2 ft single
也许这对你有用:
library(tidyverse)
#Code
newdf <- df %>% select(c('id',starts_with('emp'))) %>%
pivot_longer(-id) %>%
mutate(name=gsub('emp','',name)) %>%
rename(emp=value,year=name) %>%
left_join(
df %>% select(c('id',starts_with('marstat'))) %>%
pivot_longer(-id) %>%
mutate(name=gsub('marstat','',name)) %>%
rename(marstat=value,year=name)
)
输出:
# A tibble: 14 x 4
id year emp marstat
<int> <chr> <fct> <fct>
1 1 1 ft married
2 1 2 ft divorced
3 2 1 ft married
4 2 2 ft married
5 3 1 pt divorced
6 3 2 ft divorced
7 4 1 pt single
8 4 2 ft single
9 5 1 ft single
10 5 2 no single
11 6 1 no single
12 6 2 pt married
13 7 1 no single
14 7 2 ft single
这不是最优雅的解决方案,但它可以:
pivot_longer(df, cols=starts_with("marstat"), names_to="year", names_prefix="marstat", values_to="marstat") %>%
pivot_longer(cols=starts_with("emp"), names_to="empyear", names_prefix="emp", values_to="emp") %>%
filter(year==empyear) %>%
select(-empyear)
这样做的目的是两个连续的轴,然后删除年份编号不相同的行,最后删除多余的年份列。请注意,names_prefix允许您删除marstat1/2和emp1/2的文本部分,但您仍然会得到一个chr向量,如果需要,您需要将其转换为整数
给你这个:
A tibble: 14 x 4
id year marstat emp
<int> <chr> <fct> <fct>
1 1 1 married ft
2 1 2 divorced ft
3 2 1 married ft
4 2 2 married ft
5 3 1 divorced pt
6 3 2 divorced ft
7 4 1 single pt
8 4 2 single ft
9 5 1 single ft
10 5 2 single no
11 6 1 single no
12 6 2 married pt
13 7 1 single no
14 7 2 single ft
这不是最优雅的解决方案,但它可以:
pivot_longer(df, cols=starts_with("marstat"), names_to="year", names_prefix="marstat", values_to="marstat") %>%
pivot_longer(cols=starts_with("emp"), names_to="empyear", names_prefix="emp", values_to="emp") %>%
filter(year==empyear) %>%
select(-empyear)
这样做的目的是两个连续的轴,然后删除年份编号不相同的行,最后删除多余的年份列。请注意,names_prefix允许您删除marstat1/2和emp1/2的文本部分,但您仍然会得到一个chr向量,如果需要,您需要将其转换为整数
给你这个:
A tibble: 14 x 4
id year marstat emp
<int> <chr> <fct> <fct>
1 1 1 married ft
2 1 2 divorced ft
3 2 1 married ft
4 2 2 married ft
5 3 1 divorced pt
6 3 2 divorced ft
7 4 1 single pt
8 4 2 single ft
9 5 1 single ft
10 5 2 single no
11 6 1 single no
12 6 2 married pt
13 7 1 single no
14 7 2 single ft
以这种方法为例
df %>%
pivot_longer(matches("\\d$"), names_to = c("name", "year"), names_pattern = "([^\\d]+)(\\d+)$") %>%
pivot_wider()
首先,将数据帧转换为只有三列id、nameyear和value的数据帧;同时将第二列nameyear分隔为name和year。然后,只需将两列的名称和值旋转得更宽
输出
# A tibble: 14 x 4
id year emp marstat
<int> <chr> <chr> <chr>
1 1 1 ft married
2 1 2 ft divorced
3 2 1 ft married
4 2 2 ft married
5 3 1 pt divorced
6 3 2 ft divorced
7 4 1 pt single
8 4 2 ft single
9 5 1 ft single
10 5 2 no single
11 6 1 no single
12 6 2 pt married
13 7 1 no single
14 7 2 ft single
以这种方法为例
df %>%
pivot_longer(matches("\\d$"), names_to = c("name", "year"), names_pattern = "([^\\d]+)(\\d+)$") %>%
pivot_wider()
首先,将数据帧转换为只有三列id、nameyear和value的数据帧;同时将第二列nameyear分隔为name和year。然后,只需将两列的名称和值旋转得更宽
输出
# A tibble: 14 x 4
id year emp marstat
<int> <chr> <chr> <chr>
1 1 1 ft married
2 1 2 ft divorced
3 2 1 ft married
4 2 2 ft married
5 3 1 pt divorced
6 3 2 ft divorced
7 4 1 pt single
8 4 2 ft single
9 5 1 ft single
10 5 2 no single
11 6 1 no single
12 6 2 pt married
13 7 1 no single
14 7 2 ft single
美丽的我会记下来的。非常有用!谢谢非常感谢。成功了!我不明白这些部分:\\d$,[^\\d]+\\d+$…但我猜这是基于查找数字或字符串的基本R语法?不,这些是正则表达式。有关更多信息,请参阅@真漂亮!我会记下来的。非常有用!谢谢非常感谢。成功了!我不明白这些部分:\\d$,[^\\d]+\\d+$…但我猜这是基于查找数字或字符串的基本R语法?不,这些是正则表达式。有关更多信息,请参阅@kna123