Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/ssh/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Dataframe 基于一列重复数据和使用dplyr从长格式转换为宽格式的条件变异_Dataframe_Dplyr_Conditional Statements_Mutate_Spread - Fatal编程技术网

Dataframe 基于一列重复数据和使用dplyr从长格式转换为宽格式的条件变异

Dataframe 基于一列重复数据和使用dplyr从长格式转换为宽格式的条件变异,dataframe,dplyr,conditional-statements,mutate,spread,Dataframe,Dplyr,Conditional Statements,Mutate,Spread,我正在尝试将一些数据从长格式重新组织到宽格式 有很多人(MRN),每个人都做了不同次数的测序(seq_date),我想创建一个数据框,显示Val随时间的变化 我开始使用的数据帧如下所示: dat_data <- data.frame( MRN = c("012345", "012345", "012345", "012345", "012345", "012345"), seq_date = c("1-Aug-18", "27-Mar-19", "27-Mar-19", "27-M

我正在尝试将一些数据从长格式重新组织到宽格式

有很多人(MRN),每个人都做了不同次数的测序(seq_date),我想创建一个数据框,显示Val随时间的变化

我开始使用的数据帧如下所示:

dat_data <- data.frame(
  MRN = c("012345", "012345", "012345", "012345", "012345", "012345"),
  seq_date = c("1-Aug-18", "27-Mar-19", "27-Mar-19", "27-Mar-19", "7-May-19", "7-May-19"),
  Gene = c("SRSF2", "TET2", "IDH1", "SRSF2", "IDH1", "SRSF2"),
  AA = c("p.A2B", "p.C2D", "p.E2F", "p.A2B", "p.E2F", "p.A2B"),
  Val = c("0.1", "0.2", "0.3", "0.4", "0.5", "0.6")
)

> dat_data
     MRN  seq_date  Gene    AA Val
1 012345  1-Aug-18 SRSF2 p.A2B 0.1
2 012345 27-Mar-19  TET2 p.C2D 0.2
3 012345 27-Mar-19  IDH1 p.E2F 0.3
4 012345 27-Mar-19 SRSF2 p.A2B 0.4
5 012345  7-May-19  IDH1 p.E2F 0.5
6 012345  7-May-19 SRSF2 p.A2B 0.6
     MRN  Gene    AA  D1  D2  D3
1 012345 SRSF2 p.A2B 0.1 0.4 0.6
2 012345  IDH1 p.E2F   0 0.3 0.5
3 012345  TET2 p.C2D   0 0.2   0
然后使用“聚集/扩散”创建宽格式数据帧,如下所示:

dat_data <- data.frame(
  MRN = c("012345", "012345", "012345", "012345", "012345", "012345"),
  seq_date = c("1-Aug-18", "27-Mar-19", "27-Mar-19", "27-Mar-19", "7-May-19", "7-May-19"),
  Gene = c("SRSF2", "TET2", "IDH1", "SRSF2", "IDH1", "SRSF2"),
  AA = c("p.A2B", "p.C2D", "p.E2F", "p.A2B", "p.E2F", "p.A2B"),
  Val = c("0.1", "0.2", "0.3", "0.4", "0.5", "0.6")
)

> dat_data
     MRN  seq_date  Gene    AA Val
1 012345  1-Aug-18 SRSF2 p.A2B 0.1
2 012345 27-Mar-19  TET2 p.C2D 0.2
3 012345 27-Mar-19  IDH1 p.E2F 0.3
4 012345 27-Mar-19 SRSF2 p.A2B 0.4
5 012345  7-May-19  IDH1 p.E2F 0.5
6 012345  7-May-19 SRSF2 p.A2B 0.6
     MRN  Gene    AA  D1  D2  D3
1 012345 SRSF2 p.A2B 0.1 0.4 0.6
2 012345  IDH1 p.E2F   0 0.3 0.5
3 012345  TET2 p.C2D   0 0.2   0

我最熟悉的是w/dplyr,第一步尝试mutate=case_,第二步尝试收集/传播,但没有成功。非常感谢您的帮助。

我按日期分组以确定订单,然后使用
pivot\u wide
tidyr
进行传播

dat_data %>%
  mutate(
    sdate = lubridate::dmy(seq_date), # in case dates aren't in order
    Val = as.numeric(as.character(Val)) # convert factor to numeric
  ) %>%
  group_by(sdate) %>%
  mutate(
    ord_date = paste0('D',group_indices()) # Creates D1, D2, etc
  ) %>%
  pivot_wider(
    id_cols = c(MRN,Gene,AA),
    names_from = ord_date,
    values_from = Val,
    values_fill = list(Val = 0) # fills missings with 0 instead of NA
  )

# A tibble: 3 x 6
  MRN    Gene  AA       D1    D2    D3
  <fct>  <fct> <fct> <dbl> <dbl> <dbl>
1 012345 SRSF2 p.A2B   0.1   0.4   0.6
2 012345 TET2  p.C2D   0     0.2   0  
3 012345 IDH1  p.E2F   0     0.3   0.5
dat\u数据%>%
变异(
sdate=lubridate::dmy(顺序日期),以防日期不符合顺序
Val=as.numeric(as.character(Val))#将因子转换为数值
) %>%
分组依据(sdate)%>%
变异(
ord_date=paste0('D',group_index())#创建D1、D2等
) %>%
支点更宽(
id_cols=c(MRN,基因,AA),
名称自=订单日期,
值_from=Val,
值_fill=list(Val=0)#用0而不是NA填充缺失
)
#一个tibble:3x6
MRN基因AA D1 D2 D3
1 012345 SRSF2 p.A2B 0.1 0.4 0.6
2 012345 TET2 p.C2D 0.2 0
3 012345 IDH1 p.E2F 0.3 0.5