R 基于多种条件重塑数据帧
我想确定在时间t期间在同一地点和同一id人员执行的活动。变量R 基于多种条件重塑数据帧,r,dataframe,R,Dataframe,我想确定在时间t期间在同一地点和同一id人员执行的活动。变量wher表示时间步长,并记录活动在时间t发生的位置。with参数记录在时间t执行活动的人员。我想知道在时间t期间,在同一地点和同一个人进行的基于性别的常见活动。不寻常的活动和在不同地点与不同人员执行的活动我替换为0 输入 输出: id t1 t2 t3 t4 12 0 0 12 0 18个时间步的样本数据: structure(list(serial = c(11011202, 11011202), DMS
wher
表示时间步长,并记录活动在时间t发生的位置。with参数记录在时间t执行活动的人员。我想知道在时间t期间,在同一地点和同一个人进行的基于性别的常见活动。不寻常的活动和在不同地点与不同人员执行的活动我替换为0
输入
输出:
id t1 t2 t3 t4
12 0 0 12 0
18个时间步的样本数据
:
structure(list(serial = c(11011202, 11011202), DMSex = c(1, 2
), act1_1 = c(110, 110), act1_2 = c(110, 110), act1_3 = c(110,
110), act1_4 = c(110, 110), act1_5 = c(110, 110), act1_6 = c(110,
110), act1_7 = c(110, 110), act1_8 = c(110, 110), act1_9 = c(110,
110), act1_10 = c(110, 110), act1_11 = c(110, 110), act1_12 = c(8219,
110), act1_13 = c(310, 110), act1_14 = c(3210, 110), act1_15 = c(3110,
110), act1_16 = c(7241, 110), act1_17 = c(210, 110), act1_18 = c(3819,
110), wher_1 = c(11, 11), wher_2 = c(11, 11), wher_3 = c(11,
11), wher_4 = c(11, 11), wher_5 = c(11, 11), wher_6 = c(11, 11
), wher_7 = c(11, 11), wher_8 = c(11, 11), wher_9 = c(11, 11),
wher_10 = c(11, 11), wher_11 = c(11, 11), wher_12 = c(11,
11), wher_13 = c(11, 11), wher_14 = c(11, 11), wher_15 = c(11,
11), wher_16 = c(11, 11), wher_17 = c(11, 11), wher_18 = c(11,
11), wit4_1 = c(0, 0), wit4_2 = c(0, 0), wit4_3 = c(0, 0),
wit4_4 = c(0, 0), wit4_5 = c(0, 0), wit4_6 = c(0, 0), wit4_7 = c(0,
0), wit4_8 = c(0, 0), wit4_9 = c(0, 0), wit4_10 = c(0, 0),
wit4_11 = c(0, 0), wit4_12 = c(0, 0), wit4_13 = c(0, 0),
wit4_14 = c(0, 0), wit4_15 = c(0, 0), wit4_16 = c(0, 0),
wit4_17 = c(0, 0), wit4_18 = c(0, 0)), row.names = 1:2, class = "data.frame")
其中,
act1
是t
wit4
是wit
和wher
是wher一种结合dplyr
和purrr
的解决方案可以是:
map(.x = as.character(1:4),
~ df %>%
select(id, ends_with(.x)) %>%
group_by(id) %>%
mutate_at(vars(matches("^wher|^wit")), ~ all(. == first(.))) %>%
ungroup() %>%
mutate(cond = rowSums(select(., matches("^wher|^wit"))) == 2) %>%
group_by(id) %>%
mutate_at(vars(starts_with("t")), ~ all(. == first(.)) * cond * .) %>%
ungroup() %>%
select(starts_with("t"))) %>%
bind_cols(df %>%
select(id)) %>%
group_by(id) %>%
summarise_all(first)
id t1 t2 t3 t4
<int> <int> <int> <int> <int>
1 12 0 0 12 0
谢谢,如果你有时间,请你解释一下你的编码方式。提供了一些解释:)你的意思是你的输出不是从t1到t100排序的?非常感谢你的时间,所以我的意思是,如果我使用求幂运算符,我不会得到100个时间步,我得到了更多,但我不知道如何减少这个。所以你的问题是你不知道最终会有多少对?您是否在寻找一种动态解决方案,它可以从1到实际的对数(比如说99)进行转换?
map(.x = as.character(1:4),
~ df %>%
select(id, ends_with(.x)) %>%
group_by(id) %>%
mutate_at(vars(matches("^wher|^wit")), ~ all(. == first(.))) %>%
ungroup() %>%
mutate(cond = rowSums(select(., matches("^wher|^wit"))) == 2) %>%
group_by(id) %>%
mutate_at(vars(starts_with("t")), ~ all(. == first(.)) * cond * .) %>%
ungroup() %>%
select(starts_with("t"))) %>%
bind_cols(df %>%
select(id)) %>%
group_by(id) %>%
summarise_all(first)
id t1 t2 t3 t4
<int> <int> <int> <int> <int>
1 12 0 0 12 0
map(.x = str_extract(names(df)[grepl("^act", names(df))], "_.*+$"),
~ df %>%
select(serial, ends_with(.x)) %>%
group_by(serial) %>%
mutate_at(vars(matches("^wher|^wit")), ~ all(. == first(.))) %>%
ungroup() %>%
mutate(cond = rowSums(select(., matches("^wher|^wit"))) == 2) %>%
group_by(serial) %>%
mutate_at(vars(starts_with("act")), ~ all(. == first(.)) * cond * .) %>%
ungroup() %>%
select(starts_with("act"))) %>%
bind_cols(df %>%
select(serial)) %>%
group_by(serial) %>%
summarise_all(first)
serial act1_1 act1_2 act1_3 act1_4 act1_5 act1_6 act1_7 act1_8 act1_9 act1_10 act1_11 act1_12
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1.10e7 110 110 110 110 110 110 110 110 110 110 110 0
# … with 6 more variables: act1_13 <dbl>, act1_14 <dbl>, act1_15 <dbl>, act1_16 <dbl>,
# act1_17 <dbl>, act1_18 <dbl>