R通过查找字典替换列_R_Dataframe_Lookup_Na

R通过查找字典替换列

r dataframe

R通过查找字典替换列,r,dataframe,lookup,na,R,Dataframe,Lookup,Na,在这个问题中，我需要能够从dataframe的列中查找值，不仅基于一个属性，而且基于与字典相比的更多属性和范围。（是的，这实际上是一个故事的延续）对于R-known ppl来说，这应该是一个简单的问题，因为我提供了基本索引的工作解决方案，需要升级，可能很容易。。。但这对我来说很难，因为我正在学习R 从何处开始： "rngvalue","80","116" 36,NA,NA 600000,NA,NA 367,5,NA 90,NA,6 "rngvalue","80","116" 36,0.03

在这个问题中，我需要能够从dataframe的列中查找值，不仅基于一个属性，而且基于与字典相比的更多属性和范围。（是的，这实际上是一个故事的延续）

对于R-known ppl来说，这应该是一个简单的问题，因为我提供了基本索引的工作解决方案，需要升级，可能很容易。。。但这对我来说很难，因为我正在学习R

从何处开始：

"rngvalue","80","116"
36,NA,NA
600000,NA,NA
367,5,NA
90,NA,6

"rngvalue","80","116"
36,0.03,0.135                   #col80 is always replaced by 0.03
600000,0.03,0.105               #col116 needs to be decided on range, this value is bigger than everything in dictionary so take the last one
367,5,0.11                      #5 not replaced, but second column nicely looks up to 0.11
90,0.03,6                       #6 not replaced

当我确实想根据（小）字典testdefs的列默认值替换（大）表df1中列testcolnames中缺少的值时（通过使testdefs$LABMET_ID等于testcolnames中的列名来选择行），我使用以下代码：

testcolnames=c("80","116") #...result of regexp on colnames(df1), originally much longer df1[,testcolnames] <- lapply(testcolnames, function(x) { tmpcol<-df1[,x]; tmpcol[is.na(tmpcol)] <- testdefs$default[match(x, testdefs$LABMET_ID)]; tmpcol })
df1:

"rngvalue","80","116" 36,NA,NA 600000,NA,NA 367,5,NA 90,NA,6

"rngvalue","80","116" 36,0.03,0.135 #col80 is always replaced by 0.03 600000,0.03,0.105 #col116 needs to be decided on range, this value is bigger than everything in dictionary so take the last one 367,5,0.11 #5 not replaced, but second column nicely looks up to 0.11 90,0.03,6 #6 not replaced
要转换为：

"rngvalue","80","116" 36,NA,NA 600000,NA,NA 367,5,NA 90,NA,6

"rngvalue","80","116" 36,0.03,0.135 #col80 is always replaced by 0.03 600000,0.03,0.105 #col116 needs to be decided on range, this value is bigger than everything in dictionary so take the last one 367,5,0.11 #5 not replaced, but second column nicely looks up to 0.11 90,0.03,6 #6 not replaced

由于间隔没有间隙，因此可以使用
findInterval
。我将使用
plyr
中的
dlply
将查找表更改为包含断点和每个值的默认值的列表

## Transform lookup table to a list with breaks for intervals library(plyr) lookup <- dlply(testdefs, .(LABMET_ID), function(x) list(breaks=c(rbind(x$lower, x$upper), x$upper[length(x$upper)])[c(T,F)], default=x$default))
然后，您可以使用以下命令执行查找

testcolnames=c("80","116") df1[,testcolnames] <- lapply(testcolnames, function(x) { tmpcol <- df1[,x] defaults <- with(lookup[[x]], { default[pmax(pmin(length(breaks)-1, findInterval(df1$rngvalue, breaks)), 1)] }) tmpcol[is.na(tmpcol)] <- defaults[is.na(tmpcol)] tmpcol }) # rngvalue 80 116 # 1 36 0.03 0.135 # 2 600000 0.03 0.105 # 3 367 5.00 0.110 # 4 90 0.03 6.000

testcolnames=c（“80”、“116”） df1[，testcolnames]间隔是否总是连续的，如“116”，即（31-365、366-5475、5476-54750等），并且没有间隙？是的！我很抱歉忘了提：）谢谢你的邀请！工作很有魅力！我只需要在那里明确地写下查找[[x]]不知道为什么“with”不起作用，如果字典中没有替换列，我需要添加ifelse。