如何替换R中条件的特定字符串？_R_Conditional Statements_Bioinformatics

如何替换R中条件的特定字符串？

如何替换R中条件的特定字符串？,r,conditional-statements,bioinformatics,R,Conditional Statements,Bioinformatics,我有数据，我压缩了重复的基因结果，每一个都在一行中。这使得一些行填充了逗号，我试图用NA替换只包含逗号的行。然而，我也有一些行带有逗号和定性数据，我正试图保留它们。例如： Gene Condition Gene1 Name=Asymmetrical dimethylarginine level, Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker

我有数据，我压缩了重复的基因结果，每一个都在一行中。这使得一些行填充了逗号，我试图用NA替换只包含逗号的行。然而，我也有一些行带有逗号和定性数据，我正试图保留它们。例如：

Gene     Condition
Gene1    Name=Asymmetrical dimethylarginine level, Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker
Gene2    Name=blood pressure, Name=diabetes
Gene3    Name=heart disease, , , , , 
Gene4    , , , , , , , , ,
Gene5    NA
Gene6    , , ,

预期产出：

Gene     Condition
Gene1    Name=Asymmetrical dimethylarginine level, Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker
Gene2    Name=blood pressure, Name=diabetes
Gene3    Name=heart disease, , , , , 
Gene4    NA
Gene5    NA
Gene6    NA
#ideally I would get rid of Gene3's extra commas but this is not necessary

我试图为一条语句编写代码，比如“if the row在condition column replace to NA中只有逗号”，并尝试使用一条语句，比如

data$condition[if（“，”&！[a-Z]|[a-Z]|[=]）]a选项
grepl（pattern=“^[，]+$”


当该行只包含空格和逗号时，此函数将返回TRUE
DF <-
structure(list(Gene = c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5", 
"Gene6"), Condition= c("Name=Asymmetrical dimethylarginine leve,l Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker", 
"Name=blood pressure, Name=diabetes", "Name=heart disease, , , , ,", 
", , , , , , , , ,", NA, "Name=kidney disease, , ,")), 
row.names = c(NA, -6L), class = "data.frame")

DF[which(grepl("^[ ,]+$",DF$Condition)==T),2]<-NA


DF如果我理解正确，您可以尝试在单元格仅包含逗号时删除逗号
DF$condition <- gsub('^(,\\s*)+$',NA, DF$Condition)

第二次输出：
> gsub('^$', NA, gsub('(,\\s*)+$','', DF$Condition))
[1] "Name=Asymmetrical dimethylarginine leve,l Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker"
[2] "Name=blood pressure, Name=diabetes"                                                                                                      
[3] "Name=heart disease"                                                                                                                      
[4] NA                                                                                                                                        
[5] NA                                                                                                                                        
[6] "Name=kidney disease" 

您可以像下面那样尝试grepl
，其中只有，
的行将被设置为NA

DF <- within(DF,Condition <-replace(Condition,!grepl("[[:alnum:]]",Condition),NA))

DF
> gsub('^(,\\s*)+$',NA, DF$Condition)
[1] "Name=Asymmetrical dimethylarginine leve,l Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker"
[2] "Name=blood pressure, Name=diabetes"                                                                                                      
[3] "Name=heart disease, , , , ,"                                                                                                             
[4] NA                                                                                                                                        
[5] NA                                                                                                                                        
[6] "Name=kidney disease, , ," 

> gsub('^$', NA, gsub('(,\\s*)+$','', DF$Condition))
[1] "Name=Asymmetrical dimethylarginine leve,l Name=Bipolar disorder and schizophrenia, Name=3-hydroxypropylmercapturic acid levels in smoker"
[2] "Name=blood pressure, Name=diabetes"                                                                                                      
[3] "Name=heart disease"                                                                                                                      
[4] NA                                                                                                                                        
[5] NA                                                                                                                                        
[6] "Name=kidney disease" 

DF <- within(DF,Condition <-replace(Condition,!grepl("[[:alnum:]]",Condition),NA))