R 根据数据帧子集内的值匹配重命名因子级别
我试图为系数R 根据数据帧子集内的值匹配重命名因子级别,r,R,我试图为系数lepsp的空白级别分配名称,条件是子集内的值匹配。数据示例包括: df<- plantfam lepfam lepsp lepcn Asteraceae Geometridae Eois sp green/spikes Asteraceae Erebidae Anoba sp green/nospikes
lepsp
的空白级别分配名称,条件是子集内的值匹配。数据示例包括:
df<-
plantfam lepfam lepsp lepcn
Asteraceae Geometridae Eois sp green/spikes
Asteraceae Erebidae Anoba sp green/nospikes
Asteraceae Erebidae green/nospikes
Melastomaceae Noctuidae Balsinae sp
Poaceae Erebidae Deinopa sp black/orangespots
Poaceae Erebidae black/orangespots
Poaceae Erebidae Cocytia sp black/yellowspots
Poaceae black/yellowspots
我尝试了以下各种方法,但没有成功:
简单的base R,用于检查要重命名的组合。本质上,您将获得plantfam/lepfam/lepcn组合的唯一列表,并将其与原始数据集合并: 读入数据并确保格式符合预期:
df<- read.csv(text =
'plantfam,lepfam,lepsp,lepcn
Asteraceae,Geometridae,Eois sp,green/spikes
Asteraceae,Erebidae,Anoba sp,green/nospikes
Asteraceae,Erebidae,NA,green/nospikes
Melastomaceae,Noctuidae,Balsinae sp,NA
Poaceae,Erebidae,Deinopa sp,black/orangespots
Poaceae,Erebidae,NA,black/orangespots
Poaceae,Erebidae,NA,balck/yellowspots')
# assumes blanks are NA
# if blanks are actually empty strings "" then turn those into NA's
# make sure everything is a character, not a factor
df <- as.data.frame(apply(df,2,as.character),stringsAsFactors = F)
df您能否提供您的数据集样本,以便我们能够生产出一个可重复的解决方案?我的印象是,上面是数据集的一个示例。我能提供哪些进一步帮助?谢谢您的时间。我已经为示例数据帧添加了代码,这可能是您所要求的。再次感谢你的帮助。
output<-
plantfam lepfam lepsp lepcn
Asteraceae Geometridae Eois sp green/spikes
Asteraceae Erebidae Anoba sp green/nospikes
Asteraceae Erebidae Anoba sp green/nospikes
Melastomaceae Noctuidae Balsinae sp
Poaceae Erebidae Deinopa sp black/orangespots
Poaceae Erebidae Deinopa sp black/orangespots
Poaceae Erebidae Cocytia sp black/yellowspots
Poaceae Cocytia sp black/yellowspots
df<- read.csv(text =
'plantfam,lepfam,lepsp,lepcn
Asteraceae,Geometridae,Eois sp,green/spikes
Asteraceae,Erebidae,Anoba sp,green/nospikes
Asteraceae,Erebidae,NA,green/nospikes
Melastomaceae,Noctuidae,Balsinae sp,NA
Poaceae,Erebidae,Deinopa sp,black/orangespots
Poaceae,Erebidae,NA,black/orangespots
Poaceae,Erebidae,NA,balck/yellowspots')
# assumes blanks are NA
# if blanks are actually empty strings "" then turn those into NA's
# make sure everything is a character, not a factor
df <- as.data.frame(apply(df,2,as.character),stringsAsFactors = F)
# get a unique list of all combinations that don't have missing data
dflookup <- unique(na.omit(df))
# inspect combinations to be renamed, there should be no duplicate plantfam/lepfam/lepcn combinations
dflookup
# use the lookup to merge in all known names
newdf <- merge(df,dflookup,by = c('plantfam','lepfam','lepcn'),all.x = T,suffixes = c('old','new'))
# use original lepsp when new lepsp is NA
newdf$lepsp <- ifelse(is.na(newdf$lepspnew),newdf$lepspold,newdf$lepspnew)
# remove unneeded columns
newdf$lepspold <- newdf$lepspnew <- NULL
# turn back into factors if desired
newdf <- as.data.frame(apply(newdf,2,as.factor))