R 获取列中行子集的最大值_R_Dplyr_Tidyr

R 获取列中行子集的最大值

R 获取列中行子集的最大值,r,dplyr,tidyr,R,Dplyr,Tidyr,对于每个患者（ID）（长格式，使用tidyr:：gather（）），我的数据在不同时间具有不同的值，如下所示： ID TIME VALUE 1 10 14 1 20 34 1 30 35 2 10 15 2 20 14 2 30 11 3 10 16 3 20 64 3 30 55 ID TIME VALUE

对于每个患者（ID）（长格式，使用tidyr:：gather（）），我的数据在不同时间具有不同的值，如下所示：

ID    TIME    VALUE
1     10      14
1     20      34
1     30      35
2     10      15
2     20      14
2     30      11
3     10      16
3     20      64
3     30      55

ID    TIME    VALUE    MAX
1     10      14       30
1     20      34       30
1     30      35       30
2     10      15       10
2     20      14       10
2     30      11       10
3     10      16       20
3     20      64       20
3     30      55       20

我想添加一个名为

MAX

的新列，其中包含每个患者获得最大值的时间，如下所示：

ID    TIME    VALUE
1     10      14
1     20      34
1     30      35
2     10      15
2     20      14
2     30      11
3     10      16
3     20      64
3     30      55

ID    TIME    VALUE    MAX
1     10      14       30
1     20      34       30
1     30      35       30
2     10      15       10
2     20      14       10
2     30      11       10
3     10      16       20
3     20      64       20
3     30      55       20

我尝试使用不同的方法使我的表看起来像这样，但没有成功，每次都会弄乱我的数据，我尝试在这里检查stackoverflow，但没有成功

以下是我尝试过的一种方法：

data$MAX<- data %>% dplyr::group_by(data$ID) %>% filter(VALUE$ID == max(VALUE$ID))

在运行rmd脚本后，我在输出中针对不同的ID重复看到此错误消息

编辑2：

（为了简单起见，我删除了所有其他变量）

请注意，大多数值行不是0，大多数值行都大于0。我们可以使用

which.max

创建数字索引，并在按“ID”分组后将相应的“时间”子集

library(dplyr)
data %>%
     group_by(ID) %>%
     mutate(MAX = TIME[which.max(VALUE)])
     # // if the column is not numeric, convert to numeric
     # mutate(MAX = TIME[which.max(as.numeric(VALUE))]) 
     # it may be better to convert it before that

-输出

# A tibble: 9 x 4
# Groups:   ID [3]
#     ID  TIME VALUE   MAX
#  <int> <int> <int> <int>
#1     1    10    14    30
#2     1    20    34    30
#3     1    30    35    30
#4     2    10    15    10
#5     2    20    14    10
#6     2    30    11    10
#7     3    10    16    20
#8     3    20    64    20
#9     3    30    55    20

#一个tible:9 x 4
#组别:ID[3]
#ID时间值最大值
#     
#1     1    10    14    30
#2     1    20    34    30
#3     1    30    35    30
#4     2    10    15    10
#5     2    20    14    10
#6     2    30    11    10
#7     3    10    16    20
#8     3    20    64    20
#9     3    30    55    20

数据

data我们可以使用which.max
创建一个数字索引，并在按“ID”分组后对相应的“TIME”进行子集划分
library(dplyr)
data %>%
     group_by(ID) %>%
     mutate(MAX = TIME[which.max(VALUE)])
     # // if the column is not numeric, convert to numeric
     # mutate(MAX = TIME[which.max(as.numeric(VALUE))]) 
     # it may be better to convert it before that

-输出
# A tibble: 9 x 4
# Groups:   ID [3]
#     ID  TIME VALUE   MAX
#  <int> <int> <int> <int>
#1     1    10    14    30
#2     1    20    34    30
#3     1    30    35    30
#4     2    10    15    10
#5     2    20    14    10
#6     2    30    11    10
#7     3    10    16    20
#8     3    20    64    20
#9     3    30    55    20

#一个tible:9 x 4
#组别:ID[3]
#ID时间值最大值
#     
#1     1    10    14    30
#2     1    20    34    30
#3     1    30    35    30
#4     2    10    15    10
#5     2    20    14    10
#6     2    30    11    10
#7     3    10    16    20
#8     3    20    64    20
#9     3    30    55    20

数据
data如果您只想使用dplyr：
library(dplyr)
# Create reprex
df <- tibble::tribble(
  ~ID, ~TIME, ~VALUE,
1, 10, 14,
1, 20, 34,
1, 30, 35,
2, 10, 15,
2, 20, 14,
2, 30, 11,
3, 10, 16,
3, 20, 64,
3, 30, 55,
4, 30, NA,
4, 20, NA,
5, 10, 10,
5, 20, 30
)

        df_max <- df %>% 
  group_by(ID) %>% 
  mutate(rn = row_number(desc(VALUE)),         # creates an id with 1 being the max VALUE
         MAX = case_when(rn == 1 ~ TIME)) %>%  # when the max VALUE is 1 then TIME. Not using TRUE ~ anything results in NAs
  fill(MAX, .direction = "downup") %>%         # copies the value up and down where there is an NA within the grouped ID
  select(-rn) %>%                              # remove the row number
  ungroup()

库（dplyr）
#创建reprex
df%
mutate（rn=row_number（desc（VALUE）），#创建一个id，最大值为1
MAX=case_当（rn==1~TIME））%>%#当MAX值为1时，则为TIME。不使用TRUE~任何东西都会导致NAs
fill（MAX，.direction=“downup”）%>%#在分组ID中存在NA的位置上下复制值
选择（-rn）%>%#删除行号
解组（）

我添加了两个额外的行，一个ID带有NA值，以显示fill（）不会自动填充这些行，因为ID上有group_by（）。
如果您只想使用dplyr：
library(dplyr)
# Create reprex
df <- tibble::tribble(
  ~ID, ~TIME, ~VALUE,
1, 10, 14,
1, 20, 34,
1, 30, 35,
2, 10, 15,
2, 20, 14,
2, 30, 11,
3, 10, 16,
3, 20, 64,
3, 30, 55,
4, 30, NA,
4, 20, NA,
5, 10, 10,
5, 20, 30
)

        df_max <- df %>% 
  group_by(ID) %>% 
  mutate(rn = row_number(desc(VALUE)),         # creates an id with 1 being the max VALUE
         MAX = case_when(rn == 1 ~ TIME)) %>%  # when the max VALUE is 1 then TIME. Not using TRUE ~ anything results in NAs
  fill(MAX, .direction = "downup") %>%         # copies the value up and down where there is an NA within the grouped ID
  select(-rn) %>%                              # remove the row number
  ungroup()

库（dplyr）
#创建reprex
df%
mutate（rn=row_number（desc（VALUE）），#创建一个id，最大值为1
MAX=case_当（rn==1~TIME））%>%#当MAX值为1时，则为TIME。不使用TRUE~任何东西都会导致NAs
fill（MAX，.direction=“downup”）%>%#在分组ID中存在NA的位置上下复制值
选择（-rn）%>%#删除行号
解组（）

我已经添加了两行额外的内容，一个ID带有NA值，以显示fill（）不会自动填充这些内容，因为ID上有groupby（）。
感谢您的回复！我试过这个，似乎有用。将在冷却计时器后接受正确答案。不过，如果可以的话，还有一个简短的问题吗？在对某些值运行此命令时，我在which.max（data$VALUE）
中获得了强制引入的NAs。我想这是因为我有一段时间没有价值观。我如何忽略这些？@Carlton在“时间”或“价值”列中是否有NA
元素。假设最大值的“时间”为NA，则可能发生。您是否可以只检查colSums（is.na（data））
na的值，而不是时间。在我获取数据的CSV文件中，在一些值行中有
（只有一个点），因为没有该时间点的数据。我猜这些是NA的价值观？抱歉，我不小心编辑了你的帖子而不是我的。我现在已经更正了。@Carlton您需要分配，即data%group\u by（ID）%%>%mutate（MAX=TIME[which.MAX（as.numeric（VALUE）））
或者您可以使用%%
操作符，从magrittr
即data%%group\u by（ID）%%mutate（MAX=TIME[which.MAX（as.numeric（VALUE））
谢谢您的回复！我试过这个，似乎有用。将在冷却计时器后接受正确答案。不过，如果可以的话，还有一个简短的问题吗？在对某些值运行此命令时，我在which.max（data$VALUE）
中获得了强制引入的NAs。我想这是因为我有一段时间没有价值观。我如何忽略这些？@Carlton在“时间”或“价值”列中是否有NA
元素。假设最大值的“时间”为NA，则可能发生。您是否可以只检查colSums（is.na（data））
na的值，而不是时间。在我获取数据的CSV文件中，在一些值行中有
（只有一个点），因为没有该时间点的数据。我猜这些是NA的价值观？抱歉，我不小心编辑了你的帖子而不是我的。我现在已经更正了。@Carlton您需要分配，即data%group\u by（ID）%%>%mutate（MAX=TIME[which.MAX（as.numeric（VALUE）））
或者您可以使用%%
magrittr
中的操作符，即data%%group\u by（ID）%%mutate（MAX=TIME[which.MAX（as.numeric（VALUE））