Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/66.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 如何从表中查找值并插入查找列表的名称?_R_Dataframe_Dplyr_Plyr_Lookup - Fatal编程技术网

R 如何从表中查找值并插入查找列表的名称?

R 如何从表中查找值并插入查找列表的名称?,r,dataframe,dplyr,plyr,lookup,R,Dataframe,Dplyr,Plyr,Lookup,我有这样一个(示例)表: df <- read.table(header = TRUE, stringsAsFactors = FALSE, text="Gene SYMBOL Values TP53 2 3.55 XBP1 5 4.06

我有这样一个(示例)表:

df <- read.table(header = TRUE, 
                   stringsAsFactors = FALSE, 
                   text="Gene  SYMBOL  Values
                   TP53            2            3.55   
                   XBP1            5            4.06
                   TP27            1            2.53
                   REDD1           4            3.99
                   ERO1L           6            5.02
                   STK11           9            3.64
                   HIF2A           8            2.96")

df我刚刚编写了自己的函数,它替换了列值:

replace_by_lookuptable <- function(df, col, lookup) {
  assertthat::assert_that(all(col %in% names(df))) # all cols exist in df
  assertthat::assert_that(all(c("new", "old") %in% colnames(lookup)))

  cond_na_exists <- is.na(unlist(lapply(df[, col], function(x) my_match(x, lookup$old))))
  assertthat::assert_that(!any(cond_na_exists))

  df[, col] <- unlist(lapply(df[, col], function(x) lookup$new[my_match(x, lookup$old)]))
  return(df)
}

如果在基因列表中添加
listid
列,则将\u替换为\u lookuptable

genelist1$listid = 1
genelist2$listid = 2
然后,您可以将df与基因列表合并:

merge(df,rbind(genelist1,genelist2),all.x=T, by = "SYMBOL")
请注意,ERO1L在df中是符号6,在genelist1中是符号4,而HIF2A和REDD1在genelist1中是缺失的,但REDD1在df中是符号4(在genlist1中是ERO1L…,因此我不确定在这种情况下您期望的输出是什么

您还可以仅对基因名称进行合并:

merge(df,rbind(genelist1,genelist2),all.x=T, by.x = "Gene", by.y= "Gene")

您可以将所有的基因列表放入
列表中

gen_list <- list(genelist1 = genelist1,genelist2 = genelist2)

为了完整性(以及大型表的性能),这里有一种
数据表
方法:

library(data.table)
rbindlist(list(genelist1, genelist2), idcol = "glid")[, -"Gene"][
  setDT(df), on = "SYMBOL"][, .(glid =  toString(glid)), by = .(Gene, SYMBOL, Values)][]
rbindlist()
从所有基因列表中创建一个
数据。table
并添加一列
glid
,以标识每一行的起源。由于后续的连接仅在
符号上,因此忽略
基因
列。在连接之前,
df
被强制为class
数据。table
使用
setDT()
。然后通过
SYMBOL
聚合合并结果,以显示符号出现在两个基因列表中的情况,即
SYMBOL==1


编辑 如果有许多基因列表,或者需要基因列表的全名,而不仅仅是一个数字,我们可以尝试以下方法:

rbindlist(mget(ls(pattern = "^genelist")), idcol = "glid")[, -"Gene"][
  setDT(df), on = "SYMBOL"][, .(glid =  toString(glid)), by = .(Gene, SYMBOL, Values)][]
ls()
正在环境中查找名称以
genelist…开头的对象。
mget()
返回这些对象的命名列表,并将其传递给
rbindlist()

资料 由OP提供

df <- structure(list(Gene = c("TP53", "XBP1", "TP27", "REDD1", "ERO1L", 
"STK11", "HIF2A"), SYMBOL = c(2L, 5L, 1L, 4L, 6L, 9L, 8L), Values = c(3.55, 
4.06, 2.53, 3.99, 5.02, 3.64, 2.96)), .Names = c("Gene", "SYMBOL", 
"Values"), class = "data.frame", row.names = c(NA, -7L))
genelist1 <- structure(list(Gene = c("P4H", "PLK", "TP27", "KTD", "ERO1L"), 
    SYMBOL = c(10L, 7L, 1L, 11L, 4L)), .Names = c("Gene", "SYMBOL"
), class = "data.frame", row.names = c(NA, -5L))
genelist2 <- structure(list(Gene = c("TP53", "XBP1", "BHLHB", "STK11", "TP27", 
"UPK"), SYMBOL = c(2L, 5L, 12L, 9L, 1L, 18L)), .Names = c("Gene", 
"SYMBOL"), class = "data.frame", row.names = c(NA, -6L))
df
library(data.table)
rbindlist(list(genelist1, genelist2), idcol = "glid")[, -"Gene"][
  setDT(df), on = "SYMBOL"][, .(glid =  toString(glid)), by = .(Gene, SYMBOL, Values)][]
    Gene SYMBOL Values glid
1:  TP53      2   3.55    2
2:  XBP1      5   4.06    2
3:  TP27      1   2.53 1, 2
4: REDD1      4   3.99    1
5: ERO1L      6   5.02   NA
6: STK11      9   3.64    2
7: HIF2A      8   2.96   NA
rbindlist(mget(ls(pattern = "^genelist")), idcol = "glid")[, -"Gene"][
  setDT(df), on = "SYMBOL"][, .(glid =  toString(glid)), by = .(Gene, SYMBOL, Values)][]
    Gene SYMBOL Values                 glid
1:  TP53      2   3.55            genelist2
2:  XBP1      5   4.06            genelist2
3:  TP27      1   2.53 genelist1, genelist2
4: REDD1      4   3.99                   NA
5: ERO1L      6   5.02            genelist1
6: STK11      9   3.64            genelist2
7: HIF2A      8   2.96                   NA
df <- structure(list(Gene = c("TP53", "XBP1", "TP27", "REDD1", "ERO1L", 
"STK11", "HIF2A"), SYMBOL = c(2L, 5L, 1L, 4L, 6L, 9L, 8L), Values = c(3.55, 
4.06, 2.53, 3.99, 5.02, 3.64, 2.96)), .Names = c("Gene", "SYMBOL", 
"Values"), class = "data.frame", row.names = c(NA, -7L))
genelist1 <- structure(list(Gene = c("P4H", "PLK", "TP27", "KTD", "ERO1L"), 
    SYMBOL = c(10L, 7L, 1L, 11L, 4L)), .Names = c("Gene", "SYMBOL"
), class = "data.frame", row.names = c(NA, -5L))
genelist2 <- structure(list(Gene = c("TP53", "XBP1", "BHLHB", "STK11", "TP27", 
"UPK"), SYMBOL = c(2L, 5L, 12L, 9L, 1L, 18L)), .Names = c("Gene", 
"SYMBOL"), class = "data.frame", row.names = c(NA, -6L))