如何替换R中条件的特定字符串?
我有数据,我压缩了重复的基因结果,每一个都在一行中。这使得一些行填充了逗号,我试图用NA替换只包含逗号的行。然而,我也有一些行带有逗号和定性数据,我正试图保留它们。例如:如何替换R中条件的特定字符串?,r,conditional-statements,bioinformatics,R,Conditional Statements,Bioinformatics,我有数据,我压缩了重复的基因结果,每一个都在一行中。这使得一些行填充了逗号,我试图用NA替换只包含逗号的行。然而,我也有一些行带有逗号和定性数据,我正试图保留它们。例如: Gene Condition Gene1 Name=Asymmetrical dimethylarginine level, Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker
Gene Condition
Gene1 Name=Asymmetrical dimethylarginine level, Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker
Gene2 Name=blood pressure, Name=diabetes
Gene3 Name=heart disease, , , , ,
Gene4 , , , , , , , , ,
Gene5 NA
Gene6 , , ,
预期产出:
Gene Condition
Gene1 Name=Asymmetrical dimethylarginine level, Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker
Gene2 Name=blood pressure, Name=diabetes
Gene3 Name=heart disease, , , , ,
Gene4 NA
Gene5 NA
Gene6 NA
#ideally I would get rid of Gene3's extra commas but this is not necessary
我试图为一条语句编写代码,比如“if the row在condition column replace to NA中只有逗号”,并尝试使用一条语句,比如data$condition[if(“,”&![a-Z]|[a-Z]|[=])]a选项
grepl(pattern=“^[,]+$”
当该行只包含空格和逗号时,此函数将返回TRUE
DF <-
structure(list(Gene = c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5",
"Gene6"), Condition= c("Name=Asymmetrical dimethylarginine leve,l Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker",
"Name=blood pressure, Name=diabetes", "Name=heart disease, , , , ,",
", , , , , , , , ,", NA, "Name=kidney disease, , ,")),
row.names = c(NA, -6L), class = "data.frame")
DF[which(grepl("^[ ,]+$",DF$Condition)==T),2]<-NA
DF如果我理解正确,您可以尝试在单元格仅包含逗号时删除逗号
DF$condition <- gsub('^(,\\s*)+$',NA, DF$Condition)
第二次输出:
> gsub('^$', NA, gsub('(,\\s*)+$','', DF$Condition))
[1] "Name=Asymmetrical dimethylarginine leve,l Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker"
[2] "Name=blood pressure, Name=diabetes"
[3] "Name=heart disease"
[4] NA
[5] NA
[6] "Name=kidney disease"
您可以像下面那样尝试grepl
,其中只有,
的行将被设置为NA
DF <- within(DF,Condition <-replace(Condition,!grepl("[[:alnum:]]",Condition),NA))
DF
> gsub('^(,\\s*)+$',NA, DF$Condition)
[1] "Name=Asymmetrical dimethylarginine leve,l Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker"
[2] "Name=blood pressure, Name=diabetes"
[3] "Name=heart disease, , , , ,"
[4] NA
[5] NA
[6] "Name=kidney disease, , ,"
> gsub('^$', NA, gsub('(,\\s*)+$','', DF$Condition))
[1] "Name=Asymmetrical dimethylarginine leve,l Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker"
[2] "Name=blood pressure, Name=diabetes"
[3] "Name=heart disease"
[4] NA
[5] NA
[6] "Name=kidney disease"
DF <- within(DF,Condition <-replace(Condition,!grepl("[[:alnum:]]",Condition),NA))