Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/spring/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
用查找表中的匹配项替换dataframe中的每一列_R_Dataframe_Lookup - Fatal编程技术网

用查找表中的匹配项替换dataframe中的每一列

用查找表中的匹配项替换dataframe中的每一列,r,dataframe,lookup,R,Dataframe,Lookup,我有一个data.frame称为table_1,结构如下: p_id rd1 rd2 rd3 <fctr><fctr><fctr><fctr> 1 1 5 4 6 2 2 3 1 1 3 3 6 6 5 4 4 1 5 2 5 5 4 1

我有一个
data.frame
称为
table_1
,结构如下:

      p_id   rd1  rd2    rd3
     <fctr><fctr><fctr><fctr>
   1    1     5     4     6
   2    2     3     1     1
   3    3     6     6     5
   4    4     1     5     2
   5    5     4     1     4
我的目标是:对于
表1
中的每一列,我想用表
p\u scr
中的查找值替换
rd1
rd2
rd3
中的所有条目

      p_id   rd1  rd2    rd3
     <fctr><fctr><fctr><fctr>
   1    1     55    44    66
   2    2     33    11    11
   3    3     66    66    55
   4    4     11    55    22
   5    5     44    11    44
p_id rd1 rd2 rd3
1    1     55    44    66
2    2     33    11    11
3    3     66    66    55
4    4     11    55    22
5    5     44    11    44

我怀疑这将使用
mappy
lappy
match
匹配,但我还没有找到这样的好例子。我还熟悉
mutate
,我怀疑这里也可以使用它。欢迎任何建议。注:这是我实际数据的简化版本

注意:我已经更正了此代码,以匹配您的数据结构,这些都是因素。通过将ref table的rownames设置为p_id,使用t中的rd值为ref lookup表编制索引

我对p_id使用不同的值,以通过p_id行名而不是位置来突出显示索引

# t is your df; ref is your lookup table
t <- data.frame(p_id=factor(c(10,20,30,40,50)),
            rd1=factor(c(5,3,6,1,4)*10),
            rd2=factor(c(4,1,6,5,1)*10),
            rd3=factor(c(6,1,5,2,4)*10))
ref <- data.frame(p_id=factor(c(10,20,30,40,50,60)), 
              p_scr=factor(c(11,22,33,44,55,66)))

t
#   p_id rd1 rd2 rd3
# 1   10  50  40  60
# 2   20  30  10  10
# 3   30  60  60  50
# 4   40  10  50  20
# 5   50  40  10  40

ref
#   p_id p_scr
# 1   10    11
# 2   20    22
# 3   30    33
# 4   40    44
# 5   50    55
# 6   60    66

# assuming p_id is unique, set rownames of ref lookup table to p_id to allow for indexing by p_id
rownames(ref) <- ref$p_id
rownames(ref) # character values, not numeric
# [1] "10" "20" "30" "40" "50" "60"

# ref lookup table now looks like this
ref
#    p_id p_scr
# 10   10    11
# 20   20    22
# 30   30    33
# 40   40    44
# 50   50    55
# 60   60    66

# single case, ref rownames are character vectors, we want to index with corresponding character vector from t
as.character(t$rd1)
# [1] "50" "30" "60" "10" "40"
ref[as.character(t$rd1),]$p_scr # use character values of rd1 to index, matching the character values of rownames
# [1] 55 33 66 11 44
# Levels: 11 22 33 44 55 66

# apply to each rd column, returns the character values of p_scr factor
apply(t[,2:ncol(t)], 2, function(x) ref[as.character(x),]$p_scr)
# converts to numeric the character values of p_scr factor
apply(t[,2:ncol(t)], 2, function(x) as.numeric(as.character(ref[as.character(x),]$p_scr)))


# NOTE: the previous answer I gave does not work, why?
ref[t$rd1,]$p_scr # gives incorrect order
# [1] 44 22 55 11 33
# Levels: 11 22 33 44 55 66
# NOTE structure of t
str(t)
# 'data.frame': 5 obs. of  4 variables:
# $ p_id: Factor w/ 5 levels "10","20","30",..: 1 2 3 4 5
# $ rd1 : Factor w/ 5 levels "10","30","40",..: 4 2 5 1 3
# $ rd2 : Factor w/ 4 levels "10","40","50",..: 2 1 4 3 1
# $ rd3 : Factor w/ 5 levels "10","20","40",..: 5 1 4 2 3

# Do you see the character vs integer values of the factor t$rd1
t$rd1
# [1] 50 30 60 10 40
# Levels: 10 30 40 50 60
# The levels of t$rd1: "10", "30", "40", "50", "60", which correspond to 4 2 5 1 3 position
# In the case of ref[t$rd1] you are using the integer values of t$rd1 and indexing ref by position: ref[c(4,2,5,1,3)] so your output is c(44, 22, 55, 11, 33)
# In the case of ref[as.character(t$rd1) you are using the character values of t$rd1 and indexing ref by rownames: ref[c("50", "30", "60", "10", "40")] so your output is c(55, 33, 66 11, 44)
#t是您的df;ref是您的查找表

这个问题还不完全清楚,但是如果你想了解apply函数族的入门知识,请参考R base软件包,了解所有函数及其示例的详细用法。令人惊叹的太棒了。这起作用令人惊讶。我想我还是不明白为什么
rownames(ref)
和允许索引在这里工作。我将不得不继续关注单一用例(感谢添加该用例)。谢谢大家!@jmb277刚刚编辑添加了警告,如果您的数据是因子,请注意因子的级别。我做了一些抽样检查,它似乎有效。我认为数据仍然是错误的。我想我不清楚这是否会成为一个问题。
apply
的结果要么是一个数字,要么是一个NA,这对我来说没问题。嗯,是的,我想我刚才遇到了这个问题。当我尝试将查找到的数据转换为数字时,它会被更改为我无法识别的数字。数字转换可以在这个部分中完成吗<代码>应用(t[,2:ncol(t)],2,函数(x)ref[x,]$p_scr)
-我不清楚如何应用这个。-事实上,我知道了。刚才
# t is your df; ref is your lookup table
t <- data.frame(p_id=factor(c(10,20,30,40,50)),
            rd1=factor(c(5,3,6,1,4)*10),
            rd2=factor(c(4,1,6,5,1)*10),
            rd3=factor(c(6,1,5,2,4)*10))
ref <- data.frame(p_id=factor(c(10,20,30,40,50,60)), 
              p_scr=factor(c(11,22,33,44,55,66)))

t
#   p_id rd1 rd2 rd3
# 1   10  50  40  60
# 2   20  30  10  10
# 3   30  60  60  50
# 4   40  10  50  20
# 5   50  40  10  40

ref
#   p_id p_scr
# 1   10    11
# 2   20    22
# 3   30    33
# 4   40    44
# 5   50    55
# 6   60    66

# assuming p_id is unique, set rownames of ref lookup table to p_id to allow for indexing by p_id
rownames(ref) <- ref$p_id
rownames(ref) # character values, not numeric
# [1] "10" "20" "30" "40" "50" "60"

# ref lookup table now looks like this
ref
#    p_id p_scr
# 10   10    11
# 20   20    22
# 30   30    33
# 40   40    44
# 50   50    55
# 60   60    66

# single case, ref rownames are character vectors, we want to index with corresponding character vector from t
as.character(t$rd1)
# [1] "50" "30" "60" "10" "40"
ref[as.character(t$rd1),]$p_scr # use character values of rd1 to index, matching the character values of rownames
# [1] 55 33 66 11 44
# Levels: 11 22 33 44 55 66

# apply to each rd column, returns the character values of p_scr factor
apply(t[,2:ncol(t)], 2, function(x) ref[as.character(x),]$p_scr)
# converts to numeric the character values of p_scr factor
apply(t[,2:ncol(t)], 2, function(x) as.numeric(as.character(ref[as.character(x),]$p_scr)))


# NOTE: the previous answer I gave does not work, why?
ref[t$rd1,]$p_scr # gives incorrect order
# [1] 44 22 55 11 33
# Levels: 11 22 33 44 55 66
# NOTE structure of t
str(t)
# 'data.frame': 5 obs. of  4 variables:
# $ p_id: Factor w/ 5 levels "10","20","30",..: 1 2 3 4 5
# $ rd1 : Factor w/ 5 levels "10","30","40",..: 4 2 5 1 3
# $ rd2 : Factor w/ 4 levels "10","40","50",..: 2 1 4 3 1
# $ rd3 : Factor w/ 5 levels "10","20","40",..: 5 1 4 2 3

# Do you see the character vs integer values of the factor t$rd1
t$rd1
# [1] 50 30 60 10 40
# Levels: 10 30 40 50 60
# The levels of t$rd1: "10", "30", "40", "50", "60", which correspond to 4 2 5 1 3 position
# In the case of ref[t$rd1] you are using the integer values of t$rd1 and indexing ref by position: ref[c(4,2,5,1,3)] so your output is c(44, 22, 55, 11, 33)
# In the case of ref[as.character(t$rd1) you are using the character values of t$rd1 and indexing ref by rownames: ref[c("50", "30", "60", "10", "40")] so your output is c(55, 33, 66 11, 44)
n <- 1:5 # numeric
n
f <- factor(n, levels=5:1) # factor
f
levels(f)

# consequence when used to index
letters[n]
[1] "a" "b" "c" "d" "e"
letters[f]
[1] "e" "d" "c" "b" "a"