R 如何定义包含2个组和1个列的新列
我有3个专栏:R 如何定义包含2个组和1个列的新列,r,dataframe,data.table,R,Dataframe,Data.table,我有3个专栏: household persons activity 1 1 home 1 1 shopping 1 1 home 1 1 eating 1 1 work 1 1 shopping 1 1 home 1 2 home 1
household persons activity
1 1 home
1 1 shopping
1 1 home
1 1 eating
1 1 work
1 1 shopping
1 1 home
1 2 home
1 2 shopping
1 2 home
2 1 home
2 1 eating
2 1 home
第一列是家庭指数,第二列是家庭成员。每个人的每项活动都从家开始。对于每个家庭中的每个人,我想保护一个列循环,它从1开始,当活动是在家或工作后进行的活动时,将更改为循环+1。例如,在下面的数据中,第三行是home,因此第四行的loop=2,第五行是work,因此下班后的loop=3
输出
household persons activity loop
1 1 home 1
1 1 shopping 1
1 1 home 1
1 1 eating 2
1 1 work 2
1 1 shopping 3
1 1 home 3
1 2 home 1
1 2 shopping 1
1 2 home 1
2 1 home 1
2 1 eating 1
2 1 home 1
这里有一个想法。我们可以使用
rleid
、fill
和lead
函数来创建loop
列
dat2 <- dat %>%
mutate(activity2 = replace(activity, !activity %in% c("home", "work"), NA)) %>%
group_by(household, persons) %>%
fill(activity2) %>%
mutate(loop = lead(rleid(activity2))) %>%
fill(loop) %>%
ungroup() %>%
select(-activity2)
dat2
# # A tibble: 13 x 4
# household persons activity loop
# <int> <int> <chr> <int>
# 1 1 1 home 1
# 2 1 1 shopping 1
# 3 1 1 home 1
# 4 1 1 eating 2
# 5 1 1 work 2
# 6 1 1 shopping 3
# 7 1 1 home 3
# 8 1 2 home 1
# 9 1 2 shopping 1
# 10 1 2 home 1
# 11 2 1 home 1
# 12 2 1 eating 1
# 13 2 1 home 1
dat2%
mutate(activity2=替换(activity,!activity%在%c(“家庭”、“工作”)中),NA))%>%
组别(住户,人士)%>%
填充(活动2)%>%
突变(循环=先导(rleid(活动2)))%>%
填充(循环)%>%
解组()%>%
选择(-activity2)
dat2
##A tibble:13 x 4
#住户活动循环
#
#1家1
#2 1购物1
#3 1家1
#4 1吃2
#5 1工作2
#6 1购物3
#7 1家3
#8 1 2家1
#9 1 2购物1
#1012家1
#11 2 1家1
#12 2 1吃1
#13 2 1家1
数据
dat <- read.table(text = "household persons activity
1 1 home
1 1 shopping
1 1 home
1 1 eating
1 1 work
1 1 shopping
1 1 home
1 2 home
1 2 shopping
1 2 home
2 1 home
2 1 eating
2 1 home",
stringsAsFactors = FALSE, header = TRUE)
dat这里有一个想法。我们可以使用rleid
、fill
和lead
函数来创建loop
列
dat2 <- dat %>%
mutate(activity2 = replace(activity, !activity %in% c("home", "work"), NA)) %>%
group_by(household, persons) %>%
fill(activity2) %>%
mutate(loop = lead(rleid(activity2))) %>%
fill(loop) %>%
ungroup() %>%
select(-activity2)
dat2
# # A tibble: 13 x 4
# household persons activity loop
# <int> <int> <chr> <int>
# 1 1 1 home 1
# 2 1 1 shopping 1
# 3 1 1 home 1
# 4 1 1 eating 2
# 5 1 1 work 2
# 6 1 1 shopping 3
# 7 1 1 home 3
# 8 1 2 home 1
# 9 1 2 shopping 1
# 10 1 2 home 1
# 11 2 1 home 1
# 12 2 1 eating 1
# 13 2 1 home 1
dat2%
mutate(activity2=替换(activity,!activity%在%c(“家庭”、“工作”)中),NA))%>%
组别(住户,人士)%>%
填充(活动2)%>%
突变(循环=先导(rleid(活动2)))%>%
填充(循环)%>%
解组()%>%
选择(-activity2)
dat2
##A tibble:13 x 4
#住户活动循环
#
#1家1
#2 1购物1
#3 1家1
#4 1吃2
#5 1工作2
#6 1购物3
#7 1家3
#8 1 2家1
#9 1 2购物1
#1012家1
#11 2 1家1
#12 2 1吃1
#13 2 1家1
数据
dat <- read.table(text = "household persons activity
1 1 home
1 1 shopping
1 1 home
1 1 eating
1 1 work
1 1 shopping
1 1 home
1 2 home
1 2 shopping
1 2 home
2 1 home
2 1 eating
2 1 home",
stringsAsFactors = FALSE, header = TRUE)
dat假设第一项活动始终在家或工作,则使用另一个选项:
DT[, loop := shift(cumsum(activity %chin% c('home','work')), fill=1L),
.(household, persons)]
输出:
household persons activity loop
1: 1 1 home 1
2: 1 1 shopping 1
3: 1 1 home 1
4: 1 1 eating 2
5: 1 1 work 2
6: 1 1 shopping 3
7: 1 1 home 3
8: 1 2 home 1
9: 1 2 shopping 1
10: 1 2 home 1
11: 2 1 home 1
12: 2 1 eating 1
13: 2 1 home 1
数据:
库(data.table)
DT另一种选择是假设第一项活动总是在家或工作:
DT[, loop := shift(cumsum(activity %chin% c('home','work')), fill=1L),
.(household, persons)]
输出:
household persons activity loop
1: 1 1 home 1
2: 1 1 shopping 1
3: 1 1 home 1
4: 1 1 eating 2
5: 1 1 work 2
6: 1 1 shopping 3
7: 1 1 home 3
8: 1 2 home 1
9: 1 2 shopping 1
10: 1 2 home 1
11: 2 1 home 1
12: 2 1 eating 1
13: 2 1 home 1
数据:
库(data.table)
DT您说过“当活动在家或工作时进行更改。”第六排是购物,而不是在家或工作,因此无需根据您的规则进行更改。它在工作时更改,直到下一次工作或在家,或在第五排我们有工作的人的数据行结束时才会更改,下班后,我们有循环+1直到下一个家或工作我看到我在我的代码中犯了一个错误。在replace
功能中eating
应该是work
。我已经更新了我的答案。现在我知道你在说什么了。请查看我的更新。您说的是“当活动在家或工作时更改”。第六排是购物,而不是在家或工作,因此无需根据您的规则进行更改。它在工作时更改,直到下一次工作或在家,或该人员的数据行结束时才更改。在第五排,我们有工作,下班后,我们有循环+1直到下一个家或工作我看到我在我的代码中犯了一个错误。在replace
功能中eating
应该是work
。我已经更新了我的答案。现在我知道你在说什么了。请看我的更新。