R 如何使用从查找表中粘贴的select COLNAME值创建新列

R 如何使用从查找表中粘贴的select COLNAME值创建新列,r,R,我试图在一个数据帧(名为“input”)中生成一个新列,该列根据查找表中相关列中的值从另一个数据帧(名为“Lookup”)中获取colname作为值。下面是一些表示这两个表的假数据: 创建假查找表 如有任何建议,将不胜感激 首先,由于数据帧LookUp和input中变量的值相同,而且LookUp$Drugs和input$druge中似乎没有重复项,因此最好将它们合并,但需要先打包:数据。table和dplyr: install.packages(c("data.table", "dplyr"))

我试图在一个数据帧(名为“input”)中生成一个新列,该列根据查找表中相关列中的值从另一个数据帧(名为“Lookup”)中获取colname作为值。下面是一些表示这两个表的假数据:

创建假查找表
如有任何建议,将不胜感激

首先,由于数据帧
LookUp
input
中变量的值相同,而且
LookUp$Drugs
input$druge
中似乎没有重复项,因此最好将它们合并,但需要先打包:
数据。table
dplyr

install.packages(c("data.table", "dplyr"))
library(data.table)
library(dplyr)
让我们加入表格:

output <- merge(input, LookUp, by.x = "Drug", by.y = "Drugs", all.x = T)

           Drug rowID CYP1A1 CYP1A2 CYP1B1 CYP2A6 CYP2A13 CYP2B6 CYP2C8 CYP2C9
1 amitriptyline     1   <NA>  S_Inh   <NA>   <NA>      NA      S  S_Inh      S
2     asenapine     2   <NA>   <NA>   <NA>   <NA>      NA   <NA>   <NA>   <NA>
3     bupropion     3   <NA>      S   <NA>      S      NA  S_Inh      S      S
4   desipramine     4   <NA>   <NA>   <NA>    Inh      NA    Ind   <NA>   <NA>

这里有一个关于
dplyr
reforme2
包的想法

#First you add stringsAsFactors = FALSE in your dataframes,

LookUp <- data.frame(Drugs, CYP1A1,CYP1A2, CYP1B1, CYP2A6,CYP2A13,CYP2B6,CYP2C8,CYP2C9, stringsAsFactors = FALSE)
input <- data.frame(rowID=c(1:4), Drug=Drugs[c(1,3,4,9)], stringsAsFactors = FALSE)

library(dplyr)
library(reshape2)

melt(LookUp, id.vars = 'Drugs', na.rm = TRUE) %>% 
  group_by(Drugs) %>% 
  summarise(metabCYPs = toString(variable[grepl('S', value)])) %>%   
  left_join(input, ., by = c('Drug' = 'Drugs'))

#  rowID          Drug                              metabCYPs
#1     1 amitriptyline         CYP1A2, CYP2B6, CYP2C8, CYP2C9
#2     2     asenapine                                   <NA>
#3     3     bupropion CYP1A2, CYP2A6, CYP2B6, CYP2C8, CYP2C9
#4     4   desipramine                                       

dplyr
重塑
bug me。。。下面是另一个在药物变量上使用隐式循环的想法:

metabCYPs <- sapply(LookUp$Drugs, function(x) paste0(names(LookUp[which(LookUp$Drugs == x), grepl("S", LookUp[which(LookUp$Drugs == x), setdiff(names(LookUp), "Drugs")])]), collapse = ", "))
output <- data.frame(input, metabCYPs=metabCYPs[match(input$Drugs, names(metabCYPs))])

谢谢,卡克萨。这将使用查找表中的值而不是列名创建输出列。它还为输出表中的每一行生成相同的值,而不是对应于正确药物的值。然而,Sotos解决方案似乎奏效了。
install.packages(c("data.table", "dplyr"))
library(data.table)
library(dplyr)
output <- merge(input, LookUp, by.x = "Drug", by.y = "Drugs", all.x = T)

           Drug rowID CYP1A1 CYP1A2 CYP1B1 CYP2A6 CYP2A13 CYP2B6 CYP2C8 CYP2C9
1 amitriptyline     1   <NA>  S_Inh   <NA>   <NA>      NA      S  S_Inh      S
2     asenapine     2   <NA>   <NA>   <NA>   <NA>      NA   <NA>   <NA>   <NA>
3     bupropion     3   <NA>      S   <NA>      S      NA  S_Inh      S      S
4   desipramine     4   <NA>   <NA>   <NA>    Inh      NA    Ind   <NA>   <NA>
output$metabCYPs <- output[,3:10] %>%
  apply(1, paste0) %>% 
  setdiff("NA") %>% 
  paste0(collapse = ", ")
output[,3:10] <- NA
#First you add stringsAsFactors = FALSE in your dataframes,

LookUp <- data.frame(Drugs, CYP1A1,CYP1A2, CYP1B1, CYP2A6,CYP2A13,CYP2B6,CYP2C8,CYP2C9, stringsAsFactors = FALSE)
input <- data.frame(rowID=c(1:4), Drug=Drugs[c(1,3,4,9)], stringsAsFactors = FALSE)

library(dplyr)
library(reshape2)

melt(LookUp, id.vars = 'Drugs', na.rm = TRUE) %>% 
  group_by(Drugs) %>% 
  summarise(metabCYPs = toString(variable[grepl('S', value)])) %>%   
  left_join(input, ., by = c('Drug' = 'Drugs'))

#  rowID          Drug                              metabCYPs
#1     1 amitriptyline         CYP1A2, CYP2B6, CYP2C8, CYP2C9
#2     2     asenapine                                   <NA>
#3     3     bupropion CYP1A2, CYP2A6, CYP2B6, CYP2C8, CYP2C9
#4     4   desipramine                                       
melt(LookUp, id.vars = 'Drugs', na.rm = TRUE) %>% 
   group_by(Drugs) %>% 
   summarise(metabCYPs = toString(variable[grepl('S', value)]), 
             with_Ihn = toString(variable[grepl('Inh', value)]), 
             with_Ind = toString(variable[grepl('Ind', value)])) %>% 
   left_join(input, ., by = c('Drug' = 'Drugs'))
metabCYPs <- sapply(LookUp$Drugs, function(x) paste0(names(LookUp[which(LookUp$Drugs == x), grepl("S", LookUp[which(LookUp$Drugs == x), setdiff(names(LookUp), "Drugs")])]), collapse = ", "))
output <- data.frame(input, metabCYPs=metabCYPs[match(input$Drugs, names(metabCYPs))])