R 如何使用“选择负数”定义基线_R

R 如何使用“选择负数”定义基线

R 如何使用“选择负数”定义基线,r,R,我有一个如下所示的数据集：样本数据可从以下位置获取： ID <-c("1", "1", "1","1","2", "2") Test_date <-c(-15, -8,7, 12,-3,2) Test_Result<-c(100, 98, 78,99, 65,89) Sample.data <- data.frame(ID, Test_date, Te

我有一个如下所示的数据集：样本数据可从以下位置获取：

ID <-c("1", "1", "1","1","2", "2")
Test_date <-c(-15, -8,7, 12,-3,2)
Test_Result<-c(100, 98, 78,99, 65,89)
Sample.data <- data.frame(ID, Test_date, Test_Result)

我需要使用最大的阴性测试日期的测试结果作为基线。使用测试结果除以基线测试结果计算进度。我该怎么办

最终结果应该如下所示：

非常感谢。

在我看来，使用dplyr或data.table包最容易完成这种分组操作：

ID <-c("1", "1", "1","2", "2")
Test_date <-c(-15, -8,7, -3,2)
Test_Result<-c(100, 98, 78,65,89)
Sample.data <- data.frame(ID, Test_date, Test_Result)

big_neg <- function(x) which(x == max(x[x < 0]))

library(dplyr)
Sample.data %>% 
  group_by(ID) %>%
  mutate(Progress = Test_Result / Test_Result[big_neg(Test_date)])
#> # A tibble: 5 x 4
#> # Groups:   ID [2]
#>   ID    Test_date Test_Result Progress
#>   <chr>     <dbl>       <dbl>    <dbl>
#> 1 1           -15         100    1.02 
#> 2 1            -8          98    1    
#> 3 1             7          78    0.796
#> 4 2            -3          65    1    
#> 5 2             2          89    1.37


library(data.table)
dat <- data.table(Sample.data)
dat[, Progress := Test_Result / Test_Result[big_neg(Test_date)], by=ID][]
#>    ID Test_date Test_Result  Progress
#> 1:  1       -15         100 1.0204082
#> 2:  1        -8          98 1.0000000
#> 3:  1         7          78 0.7959184
#> 4:  2        -3          65 1.0000000
#> 5:  2         2          89 1.3692308

在我看来，这种分组操作最容易通过dplyr或data.table包完成：

ID <-c("1", "1", "1","2", "2")
Test_date <-c(-15, -8,7, -3,2)
Test_Result<-c(100, 98, 78,65,89)
Sample.data <- data.frame(ID, Test_date, Test_Result)

big_neg <- function(x) which(x == max(x[x < 0]))

library(dplyr)
Sample.data %>% 
  group_by(ID) %>%
  mutate(Progress = Test_Result / Test_Result[big_neg(Test_date)])
#> # A tibble: 5 x 4
#> # Groups:   ID [2]
#>   ID    Test_date Test_Result Progress
#>   <chr>     <dbl>       <dbl>    <dbl>
#> 1 1           -15         100    1.02 
#> 2 1            -8          98    1    
#> 3 1             7          78    0.796
#> 4 2            -3          65    1    
#> 5 2             2          89    1.37


library(data.table)
dat <- data.table(Sample.data)
dat[, Progress := Test_Result / Test_Result[big_neg(Test_date)], by=ID][]
#>    ID Test_date Test_Result  Progress
#> 1:  1       -15         100 1.0204082
#> 2:  1        -8          98 1.0000000
#> 3:  1         7          78 0.7959184
#> 4:  2        -3          65 1.0000000
#> 5:  2         2          89 1.3692308

这样试试看

library(tidyverse)
df %>% 
  group_by(ID) %>% 
  filter(Test_date > 0 | Test_date == max(Test_date[Test_date < 0])) %>% 
  mutate(progress = ifelse(Test_date > 0,
                           Test_Result / Test_Result[which.min(Test_date)],
                           NA_real_)) %>% 
  right_join(df) %>% 
  arrange(ID, Test_date) %>% 
  ungroup(ID)

Joining, by = c("ID", "Test_date", "Test_Result")
# A tibble: 6 x 4
  ID    Test_date Test_Result progress
  <chr>     <dbl>       <dbl>    <dbl>
1 1           -15         100   NA    
2 1            -8          98   NA    
3 1             7          78    0.796
4 1            12          99    1.01 
5 2            -3          65   NA    
6 2             2          89    1.37

这样试试看

library(tidyverse)
df %>% 
  group_by(ID) %>% 
  filter(Test_date > 0 | Test_date == max(Test_date[Test_date < 0])) %>% 
  mutate(progress = ifelse(Test_date > 0,
                           Test_Result / Test_Result[which.min(Test_date)],
                           NA_real_)) %>% 
  right_join(df) %>% 
  arrange(ID, Test_date) %>% 
  ungroup(ID)

Joining, by = c("ID", "Test_date", "Test_Result")
# A tibble: 6 x 4
  ID    Test_date Test_Result progress
  <chr>     <dbl>       <dbl>    <dbl>
1 1           -15         100   NA    
2 1            -8          98   NA    
3 1             7          78    0.796
4 1            12          99    1.01 
5 2            -3          65   NA    
6 2             2          89    1.37

不。对于ID=1，基线是98，而不是100，而不是按日期排序的第一个测试结果。Nope。对于ID=1，基线是98，而不是100，而不是按日期排序的第一个测试结果。Nope。对于ID=1，基线是98，而不是100，而不是按日期排序时的第一个测试结果。棘手的是，我们需要使用最大负数日期作为基线，而不是最早的日期。您最初对最大负数的定义不是非常清楚。我只是通过定义一个大的函数来编辑。谢谢更新。这是保持test_date进度的一种方法吗当然，只需在我帖子中的命令之后添加一个新命令：data.table或Sample.data%>%mutateProgress=ifelseTest_date<0，progress，NA在dplyr.dat[test_date>0 | big_negTest_date]或dat%>%filterTest_date>0 | big|negTest_dateNope。对于ID=1，基线是98，而不是100，而不是按日期排序时的第一个测试结果。棘手的是，我们需要使用最大负数日期作为基线，而不是最早的日期。您最初对最大负数的定义不是非常清楚。我只是通过定义一个大的函数来编辑。谢谢更新。这是保持test_date进度的一种方法吗当然，只需在我帖子中的命令之后添加一个新命令：data.table中的dat[test_date<0，progress:=NA]或Sample.data%>%mutateProgress=ifelseTest_date<0，progress，NA在dplyr.dat[test_date>0 | big_negTest_date]或者dat%>%filterTest_date>0 | big_negTest_date如果我们需要保留更多的阳性测试日期会怎么样。我刚刚更新了问题代码。Thankshow在这种情况下进度是计算出来的吗？我刚刚更新了帖子。在这种情况下，当test_date>0/对于ID=1，date 12，进度=99/98时，通过test_1/baselinetest_1计算进度。一个愚蠢的问题，NA_real_uuu是什么意思，它是如何工作的？如果Test_date<0，则不会进行进度计算，但设置为NA not available。如果我们有更多的测试数据是正的，我们需要保留这些数据，那么可以将数值设为NA_real_。我刚刚更新了问题代码。Thankshow在这种情况下进度是计算出来的吗？我刚刚更新了帖子。在这种情况下，当test_date>0/对于ID=1，date 12，进度=99/98时，通过test_1/baselinetest_1计算进度。一个愚蠢的问题，NA_real_uuu是什么意思，它是如何工作的？如果Test_date<0，则不会进行进度计算，但设置为NA not available。您可以将NA_real_作为数值，而不是NA