R 根据数据帧子集内的值匹配重命名因子级别

R 根据数据帧子集内的值匹配重命名因子级别,r,R,我试图为系数lepsp的空白级别分配名称,条件是子集内的值匹配。数据示例包括: df<- plantfam lepfam lepsp lepcn Asteraceae Geometridae Eois sp green/spikes Asteraceae Erebidae Anoba sp green/nospikes

我试图为系数
lepsp
的空白级别分配名称,条件是子集内的值匹配。数据示例包括:

df<- 
  plantfam        lepfam         lepsp              lepcn
  Asteraceae      Geometridae    Eois sp            green/spikes
  Asteraceae      Erebidae       Anoba sp           green/nospikes                    
  Asteraceae      Erebidae                          green/nospikes            
  Melastomaceae   Noctuidae      Balsinae sp             
  Poaceae         Erebidae       Deinopa sp         black/orangespots
  Poaceae         Erebidae                          black/orangespots
  Poaceae         Erebidae       Cocytia sp         black/yellowspots
  Poaceae                                           black/yellowspots
我尝试了以下各种方法,但没有成功:

简单的base R,用于检查要重命名的组合。本质上,您将获得plantfam/lepfam/lepcn组合的唯一列表,并将其与原始数据集合并:

读入数据并确保格式符合预期:

df<- read.csv(text = 
'plantfam,lepfam,lepsp,lepcn
Asteraceae,Geometridae,Eois sp,green/spikes
Asteraceae,Erebidae,Anoba sp,green/nospikes
Asteraceae,Erebidae,NA,green/nospikes
Melastomaceae,Noctuidae,Balsinae sp,NA
Poaceae,Erebidae,Deinopa sp,black/orangespots
Poaceae,Erebidae,NA,black/orangespots
Poaceae,Erebidae,NA,balck/yellowspots')

# assumes blanks are NA
# if blanks are actually empty strings "" then turn those into NA's

# make sure everything is a character, not a factor
df <- as.data.frame(apply(df,2,as.character),stringsAsFactors = F)

df您能否提供您的数据集样本,以便我们能够生产出一个可重复的解决方案?我的印象是,上面是数据集的一个示例。我能提供哪些进一步帮助?谢谢您的时间。我已经为示例数据帧添加了代码,这可能是您所要求的。再次感谢你的帮助。
 output<- 
    plantfam        lepfam         lepsp              lepcn
    Asteraceae      Geometridae    Eois sp            green/spikes
    Asteraceae      Erebidae       Anoba sp           green/nospikes                    
    Asteraceae      Erebidae       Anoba sp           green/nospikes            
    Melastomaceae   Noctuidae      Balsinae sp             
    Poaceae         Erebidae       Deinopa sp       black/orangespots
    Poaceae         Erebidae       Deinopa sp       black/orangespots
    Poaceae         Erebidae       Cocytia sp       black/yellowspots
    Poaceae                        Cocytia sp       black/yellowspots
df<- read.csv(text = 
'plantfam,lepfam,lepsp,lepcn
Asteraceae,Geometridae,Eois sp,green/spikes
Asteraceae,Erebidae,Anoba sp,green/nospikes
Asteraceae,Erebidae,NA,green/nospikes
Melastomaceae,Noctuidae,Balsinae sp,NA
Poaceae,Erebidae,Deinopa sp,black/orangespots
Poaceae,Erebidae,NA,black/orangespots
Poaceae,Erebidae,NA,balck/yellowspots')

# assumes blanks are NA
# if blanks are actually empty strings "" then turn those into NA's

# make sure everything is a character, not a factor
df <- as.data.frame(apply(df,2,as.character),stringsAsFactors = F)
# get a unique list of all combinations that don't have missing data
dflookup <- unique(na.omit(df))

# inspect combinations to be renamed, there should be no duplicate plantfam/lepfam/lepcn combinations
dflookup

# use the lookup to merge in all known names
newdf <- merge(df,dflookup,by = c('plantfam','lepfam','lepcn'),all.x = T,suffixes = c('old','new'))

# use original lepsp when new lepsp is NA
newdf$lepsp <- ifelse(is.na(newdf$lepspnew),newdf$lepspold,newdf$lepspnew)

# remove unneeded columns
newdf$lepspold <- newdf$lepspnew <- NULL

# turn back into factors if desired
newdf <- as.data.frame(apply(newdf,2,as.factor))