R 根据条件以长格式合并2个数据集
我有两个数据帧,我想合并。数据集之间的差异在于观察的数量和收集方式。在DF1中,在两个不同的日期记录观察结果。每个记录都有一个索引,id1个人识别号,id2指的是进行记录的日期,日期必须不同。还有一个日期变量,用于记录进行记录的星期 在df2中,仅根据序列号和id1个人识别号记录观察结果。每个人只有一次观察。同样,这里还有一个日期变量,用于记录录制开始的时间 我想确定与df1在同一天记录的来自df2的观察结果 我试图创建一个新的索引,以组索引和id1转到long并根据天数合并 Df1:-天表示进行观察的时间,如指数12;id1-表示仅1人;id2表示2天-星期三id2 1和星期日id2 2 2R 根据条件以长格式合并2个数据集,r,dataframe,R,Dataframe,我有两个数据帧,我想合并。数据集之间的差异在于观察的数量和收集方式。在DF1中,在两个不同的日期记录观察结果。每个记录都有一个索引,id1个人识别号,id2指的是进行记录的日期,日期必须不同。还有一个日期变量,用于记录进行记录的星期 在df2中,仅根据序列号和id1个人识别号记录观察结果。每个人只有一次观察。同样,这里还有一个日期变量,用于记录录制开始的时间 我想确定与df1在同一天记录的来自df2的观察结果 我试图创建一个新的索引,以组索引和id1转到long并根据天数合并 Df1:-天表示进
index id1 id2 Day obs1 obs2 obs3
12 1 1 Wednesday 1 11 12
12 1 2 Sunday 2 0 0
123 1 1 Tuesday 1 0 1
123 1 2 Saturday 3 0 3
123 2 1 Monday 2 2 4
123 2 2 Saturday 1 0 8
df2:-此处,day-day变量表示进行观察的起始日期,例如id 12 day2和id 123 day1
index id1 Day day1 day2 day3 day4 day5 day6 day7
12 1 Tuesday 2 1 2 1 1 3 1
123 1 Friday 0 3 0 3 3 0 3
结果:
index id1 id2 obs1 obs2 obs3
12 1 1 1 11 12
12 1 2 2 0 0
123 1 2 3 0 3
123 2 2 1 0 8
样本数据
df1:
df2:
我们可以得到df2 lin long格式,按索引分组保存观察后出现的行,并根据索引和日期将其与df1连接
然后可以使用select仅保留所需的列 带有melt from data.table的选项
或者使用tidyverse,最好在summary中返回一个列表列,然后在长度与行数不匹配的情况下返回unnest
library(dplyr)
library(tidyr)
df2 %>%
pivot_longer(cols = day1:day7) %>%
group_by(index) %>%
slice(match(Day, weekday)[1L]:n()) %>%
summarise(Day = match(Day, weekday)[1]) %>%
inner_join(df1 %>%
mutate(Day = match(Day, weekday)), by = 'index') %>%
filter(Day.y >= Day.x)
# A tibble: 4 x 8
# index Day.x id1 id2 Day.y obs1 obs2 obs3
# <dbl> <int> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#1 12 2 1 1 3 1 11 12
#2 12 2 1 2 7 2 0 0
#3 123 5 1 2 6 3 0 3
#4 123 5 2 2 6 1 0 8
@user11964604你好,can。请检查我的最新信息tidyverse@user11964604. 看起来你的电脑正在使用a。与中显示的属性不同dput@user11964604我是。没有从评论中得到逻辑。你能不能作为一个新的投递员投递question@user11964604我的意思是,你可以作为一个新的主题/问题发布,因为其他人也已经根据以前的回答了question@user11964604您可以创建一个命名向量,并在指定从第61-da696天开始和第71-day796天开始时进行匹配。这是7天的顺序,对吗?
structure(list(index = c(12, 123), id1 = c(1, 1), Day = structure(2:1, .Label = c("Friday",
"Tuesday"), class = "factor"), day1 = c(2, 0), day2 = c(1, 3),
day3 = c(2, 0), day4 = c(1, 3), day5 = c(1, 3), day6 = c(3,
0), day7 = c(1, 3)), class = "data.frame", row.names = c(NA,
-2L))
library(dplyr)
weekday <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday",
"Saturday", "Sunday")
df2 %>%
mutate_at(vars(matches('day\\d+')), as.numeric) %>%
tidyr::pivot_longer(cols = matches('day\\d+')) %>%
group_by(index) %>%
filter(row_number() >= match(Day, weekday)[1L]) %>%
summarise(Day = match(Day, weekday)[1]) %>%
inner_join(df1 %>%mutate(Day = match(Day, weekday)), by = 'index') %>%
filter(Day.y >= Day.x)
# index Day.x id1 id2 Day.y obs1 obs2 obs3
# <dbl> <int> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#1 12 2 1 1 3 1 11 12
#2 12 2 1 2 7 2 0 0
#3 123 5 1 2 6 3 0 3
#4 123 5 2 2 6 1 0 8
library(data.table)
weekday <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
library(haven)
df1$Day <- as.character(as_factor(df1$Day))
df2$Day <- as.character(as_factor(df2$Day))
df1$Day <- match(df1$Day, weekday)
dt2 <- melt(setDT(df2), measure = patterns('^day\\d+$'))[seq_len(.N) >=
match(Day, weekday)[1L]][, .(Day = match(Day, weekday)[1]), index]
merge(setDT(df1), dt2, by = 'index')[Day.y < Day.x]
# index id1 id2 Day.x obs1 obs2 obs3 Day.y
#1: 12 1 1 3 1 11 12 2
#2: 12 1 2 7 2 0 0 2
#3: 123 1 2 6 3 0 3 5
#4: 123 2 2 6 1 0 8 5
library(dplyr)
library(tidyr)
df2 %>%
pivot_longer(cols = day1:day7) %>%
group_by(index) %>%
slice(match(Day, weekday)[1L]:n()) %>%
summarise(Day = match(Day, weekday)[1]) %>%
inner_join(df1 %>%
mutate(Day = match(Day, weekday)), by = 'index') %>%
filter(Day.y >= Day.x)
# A tibble: 4 x 8
# index Day.x id1 id2 Day.y obs1 obs2 obs3
# <dbl> <int> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#1 12 2 1 1 3 1 11 12
#2 12 2 1 2 7 2 0 0
#3 123 5 1 2 6 3 0 3
#4 123 5 2 2 6 1 0 8