在R中按时间戳或间隔和id对数据进行子集划分_R

在R中按时间戳或间隔和id对数据进行子集划分

在R中按时间戳或间隔和id对数据进行子集划分,r,R,我问了一个关于基于单个文件的时间戳对数据进行子集划分的问题，得到了一个很好的答案。现在我已经将29个文件导入到一个data.frame（l2）中，并为它们指定了从1到29的ID。我希望能够根据data.frame m中的间隔对l2中的数据进行子集我的问题是，我需要根据m中的间隔和名为l2$id的列分割l2（因为实验没有同步）例如，对于l2$id==1的l2$SkinTemp、l2$RespirationRate和l2$HeartRate中的所有值，我需要按m$P1进行拆分。对于l2$id==

我问了一个关于基于单个文件的时间戳对数据进行子集划分的问题，得到了一个很好的答案。现在我已经将29个文件导入到一个data.frame（l2）中，并为它们指定了从1到29的ID。我希望能够根据data.frame m中的间隔对l2中的数据进行子集

我的问题是，我需要根据m中的间隔和名为l2$id的列分割l2（因为实验没有同步）
例如，对于l2$id==1的l2$SkinTemp、l2$RespirationRate和l2$HeartRate中的所有值，我需要按m$P1进行拆分。对于l2$id==2的P2，依此类推

dput(head(l2)) structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "2", "20", "21", "22", "23", "24", "25", "26", "27", "3", "4", "5", "6", "7", "8", "9"), class = "factor"), Time = c(0, 0, 0, 0, 0, 0), SkinTemp = c(27.781, 27.78, 27.779, 27.779, 27.778, 27.777 ), HeartRate = c(70, 70, 70, 70, 70, 70), RespirationRate = c(10, 10, 10, 10, 10, 10)), .Names = c("id", "Time", "SkinTemp", "HeartRate", "RespirationRate"), row.names = c(NA, 6L), class = "data.frame")
我有一个data.frame（TimeStamp），它包括以秒为单位的时间间隔：

dput(head(m)) structure(list(MARKER = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), P1 = c(18, 138, 438, 678, 798, 1278), P2 = c(1, 121, 421, 541, 661, 1141), P3 = c(2, 122, 422, 542, 662, 1142 ), P4 = c(70, 190, 490, 600, 730, 1170), P5 = c(76, 196, 496, 616, 752, 1232), P6 = c(33, 153, 453, 595, 715, 1195), P7 = c(20, 149, 449, 569, 777, 1257), P8 = c(100, 241, 541, 661, 819, 1319 ), P9 = c(25, 145, 445, 583, 763, 1246), P10 = c(18, 141, 441, 621, 801, 1281), P11 = c(70, 190, 490, 710, 830, 1310), P12 = c(35, 155, 455, 635, 755, 1235), P13 = c(35, 155, 455, 575, 695, 1175 ), P14 = c(37, 157, 517, 637, 774, 1254), P15 = c(18, 138, 378, 498, 678, 1158), P16 = c(49, 169, 469, 589, 769, 1266), P17 = c(75, 195, 520, 625, 805, 1295), P18 = c(20, 140, 440, 560, 740, 1227 ), P19 = c(8, 144, 444, 564, 780, 1260), P20 = c(25, 147, 447, 648, 768, 1248), P21 = c(47, 173, 467, 587, 707, 1187), P22 = c(28, 148, 448, 568, 688, 1168), P23 = c(22, 142, 442, 562, 682, 1172 ), P24 = c(52, 145, 452, 684, 804, 1284), P25 = c(11, 131, 431, 618, 738, 1218), P26 = c(19, 139, 439, 619, 762, 1250), P27 = c(41, 161, 465, 672, 792, 1272), P28 = c(63, 183, 487, 667, 787, 1267 ), P29 = c(71, 195, 495, 675, 795, 1275), P30 = c(135, 255, 555, 675, 795, 1275), P31 = c(561, 681, 981, 1101, 1303, 1701), P32 = c(15, 135, 435, 555, 675, 1155), P33 = c(31, 151, 451, 571, 691, 1171 ), P34 = c(10, 130, 430, 550, 670, 1150), P35 = c(35, 155, 455, 695, 815, 1295)), .Names = c("MARKER", "P1", "P2", "P3", "P4", "P5", "P6", "P7", "P8", "P9", "P10", "P11", "P12", "P13", "P14", "P15", "P16", "P17", "P18", "P19", "P20", "P21", "P22", "P23", "P24", "P25", "P26", "P27", "P28", "P29", "P30", "P31", "P32", "P33", "P34", "P35"), row.names = c(NA, 6L), class = "data.frame")
如果我在一个文件上手动执行此操作，则此选项适用

P1$Segment <- cut(l2$Time,c(-Inf,m$P1)) split(l2,P1$Segment)

P1$段基本解决方案首先：作为一个完全可重复的问题，您的数据似乎不完整（例如，只有一个唯一的id ，所有m$MARKER 的值都是NA，所有但是l2中的一个值是不变的）。我将创建一个类似的结构化数据集，希望您可以根据自己的需要调整代码拥有自己的数据 set.seed(42) n <- 10 l2 <- data.frame( id = rep(1:2, each = 5), Time = rep(c(11, 33, 55, 77, 99), times = 2), SkinTemp = runif(n, min = 27.7, max = 27.9), HeartRate = 60 + sample(30, size = n, replace = TRUE), RespirationRate = 5 + sample(10, size = n, replace = TRUE) ) str(l2) # 'data.frame': 10 obs. of 5 variables: # $ id : int 1 1 1 1 1 2 2 2 2 2 # $ Time : num 11 33 55 77 99 11 33 55 77 99 # $ SkinTemp : num 27.9 27.9 27.8 27.9 27.8 ... # $ HeartRate : num 74 82 89 68 74 89 90 64 75 77 # $ RespirationRate: num 15 7 15 15 6 11 9 15 10 14 (m <- data.frame( MARKER = 1:3, P1 = c(18, 44, 135), P2 = c(1, 66, 105) )) # MARKER P1 P2 # 1 1 18 1 # 2 2 44 66 # 3 3 135 105 （do.call（rbind，…）
是为了确保我们最终得到一个data.frame；虽然不是严格要求的，但它似乎让我们更容易看到下一步
现在进行拆分。如果要拆分“2+”变量，则第二个参数需要是一个列表。请注意
drop
参数：if 未声明（默认值
FALSE
），则您将获得每个
id
组中的整个列；有时这很好，有时不是。在本例中，它会导致列表中有几个空（0行）data.frames。（我修剪了此页面的输出。）
从
cut
s返回的
factor
s 这完全是个人偏好，但我更喜欢
cut
之外的索引，而不是
factor
s或
character
s。您可以执行
cut（…，labels=FALSE）
来获得
integer
s。回想一下，超出剪切范围的值将是
NA
（不是新的）
“长”与“宽”标记如果你的data.frame
m
肯定是固定的，那么你可以不受影响，但如果你有更多的受访者，它当然会变得非常“广泛”。许多数据冒泡者更喜欢以“长”格式工作。在这个人为的例子中，它不会给你带来很多好处，但当你将工作所用的数据结构形式化时（例如，数据库、可变长度分组等），那么您可能会从使用“长”格式中获益
虽然在显示的结果中它看起来没有分裂，但请注意
组：id，段[5]
，它表示该数据上的大多数
dplyr
函数将在每个组中执行一次操作。如果将
替换为#使用浏览器（）然后运行它，您可以一次与一个组一起玩，看看do（）块是如何工作的请注意，必须返回data.frame（使用do（{…}））或将其分配给变量（使用do（newvar={…}））。后一种选择可能建议使用unnest（），具体取决于您的操作（顺便说一句：dplyr 也可以与数据库一起工作。如果你读过哈德利的任何一本书、教程、小插曲或其他文档，你可能会遇到他推荐“长”而不是“宽”，所以这几乎是“强制的”。）编辑：一条注释询问是否可以将每一行与匹配的行（MARKER ）关联。下面是对dplyr 解决方案的修改，该解决方案添加了行索引以及下限/上限 l2 %>% group_by(id) %>% # mutate( Segment = cut(Time, c(-Inf, subset(m2, id == id[1])$TimeCut))) %>% mutate( Segment = cut(Time, c(-Inf, m2$TimeCut[m2[["id"]] == id[1]]), labels = FALSE), TimeLower = c(-Inf, m2$TimeCut)[Segment], TimeUpper = c(-Inf, m2$TimeCut)[1+Segment] ) %>% group_by(id, Segment) %>% do({ dat <- . # do something with dat dat }) # Source: local data frame [10 x 8] # Groups: id, Segment [5] # id Time SkinTemp HeartRate RespirationRate Segment TimeLower TimeUpper # <int> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> # 1 1 11 27.88296 74 15 1 -Inf 18 # 2 1 33 27.88742 82 7 2 18 44 # 3 1 55 27.75723 89 15 3 44 135 # 4 1 77 27.86609 68 15 3 44 135 # 5 1 99 27.82835 74 6 3 44 135 # 6 2 11 27.80382 89 11 2 18 44 # 7 2 33 27.84732 90 9 2 18 44 # 8 2 55 27.72693 64 15 2 18 44 # 9 2 77 27.83140 75 10 3 44 135 # 10 2 99 27.84101 77 14 3 44 135 l2%>% 分组依据（id）%>% #变异（段=切割（时间，c（-Inf，子集（m2，id==id[1]）$TimeCut）））%>% 变异( 段=cut（时间，c（-Inf，m2$TimeCut[m2[[“id”]]==id[1]]），标签=FALSE）， TimeLower=c（-Inf，m2$TimeCut）[段]， TimeUpper=c（-Inf，m2$TimeCut）[1+段] ) %>% 分组依据（id，段）%>% 做({ dat数据抛出一个错误error:object'First'未找到我手动将它们放入，因为它们是NAs，我认为这样看起来会更好。我将编辑该问题。m 中有35个“P”列以及一个“MARKER”列。似乎与数字29不匹配。同时分割温度、呼吸率和心率值也没有意义。它们是在不同的数字刻度上测量的。我认为您可能需要改进对问题的描述，并显示所需的结果。@42-非常感谢您的关注这就是。实验中应该有35名参与者，但一些参与者还没有输入，所以我在看P1-P29。我想将数据按m中的列进行分割，用于l2中的每个不同id。也就是说，如果l2$id==1&&l2$timeforthesameid ，则每个值都是P{I} 对应于l2 中的不同列，在这种情况下，l2 有5列要拆分（不包括Id ），而示例m 每个P{i}有6行。这是如何工作的？非常感谢，这个答案太棒了！我也非常感谢你提供的细节！我注意到了一个细微的差别：最终我想根据id 和segment 绘制数据。例如，我想绘制在某个inte之间由id着色的所有呼吸率rval。但由于m中的每一行值不同，这将不起作用，因为间隔不匹配。是否可以为每个id标记段1、2、3、4等，而不是它们的数值？如果是这样，那将是令人惊讶的！这是一个很棒的更新，谢谢！我使用了dplyr一点，发现它真的很通用。%>%是什么意思de>%%>%% 是“管道操作符”，从magrittr 包导入。将其视为类似于shell|管道，其中 str( split(l2a, list(l2a$id, l2a$Segment), drop = TRUE) ) # List of 5 # $ 1.(-Inf,18]:'data.frame': 1 obs. of 6 variables: # ..$ id : int 1 # ..$ Time : num 11 # ..$ SkinTemp : num 27.9 # ..$ HeartRate : num 74 # ..$ RespirationRate: num 15 # ..$ Segment : Factor w/ 6 levels "(-Inf,18]","(18,44]",..: 1 # $ 1.(18,44] :'data.frame': 1 obs. of 6 variables: # $ 1.(44,135] :'data.frame': 3 obs. of 6 variables: # $ 2.(1,66] :'data.frame': 3 obs. of 6 variables: # $ 2.(66,105] :'data.frame': 2 obs. of 6 variables: library(tidyr) m2 <- gather(m, id, TimeCut, -MARKER) m2$id <- gsub("^P", "", m2$id) m2 # MARKER id TimeCut # 1 1 1 18 # 2 2 1 44 # 3 3 1 135 # 4 1 2 1 # 5 2 2 66 # 6 3 2 105 l2b <- do.call(rbind, by(l2, l2$id, function(x) { x$Segment <- cut(x$Time, c(-Inf, subset(m2, id == x$id[1])$TimeCut)) x })) library(dplyr) l2 %>% group_by(id) %>% # mutate( Segment = cut(Time, c(-Inf, subset(m2, id == id[1])$TimeCut))) %>% mutate( Segment = cut(Time, c(-Inf, m2$TimeCut[m2[["id"]] == id[1]])) ) %>% group_by(id, Segment) %>% do({ dat <- . # do something with dat dat }) # Source: local data frame [10 x 6] # Groups: id, Segment [5] # id Time SkinTemp HeartRate RespirationRate Segment # <int> <dbl> <dbl> <dbl> <dbl> <fctr> # 1 1 11 27.88296 74 15 (-Inf,18] # 2 1 33 27.88742 82 7 (18,44] # 3 1 55 27.75723 89 15 (44,135] # 4 1 77 27.86609 68 15 (44,135] # 5 1 99 27.82835 74 6 (44,135] # 6 2 11 27.80382 89 11 (1,66] # 7 2 33 27.84732 90 9 (1,66] # 8 2 55 27.72693 64 15 (1,66] # 9 2 77 27.83140 75 10 (66,105] # 10 2 99 27.84101 77 14 (66,105] l2 %>% group_by(id) %>% # mutate( Segment = cut(Time, c(-Inf, subset(m2, id == id[1])$TimeCut))) %>% mutate( Segment = cut(Time, c(-Inf, m2$TimeCut[m2[["id"]] == id[1]]), labels = FALSE), TimeLower = c(-Inf, m2$TimeCut)[Segment], TimeUpper = c(-Inf, m2$TimeCut)[1+Segment] ) %>% group_by(id, Segment) %>% do({ dat <- . # do something with dat dat }) # Source: local data frame [10 x 8] # Groups: id, Segment [5] # id Time SkinTemp HeartRate RespirationRate Segment TimeLower TimeUpper # <int> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> # 1 1 11 27.88296 74 15 1 -Inf 18 # 2 1 33 27.88742 82 7 2 18 44 # 3 1 55 27.75723 89 15 3 44 135 # 4 1 77 27.86609 68 15 3 44 135 # 5 1 99 27.82835 74 6 3 44 135 # 6 2 11 27.80382 89 11 2 18 44 # 7 2 33 27.84732 90 9 2 18 44 # 8 2 55 27.72693 64 15 2 18 44 # 9 2 77 27.83140 75 10 3 44 135 # 10 2 99 27.84101 77 14 3 44 135