R 为什么当我删除特定行时,我的输出都是NA?

R 为什么当我删除特定行时,我的输出都是NA?,r,R,我有一个数据,我上传到这里 我使用以下命令在R中加载它 df <- read.delim("path to the data", header=TRUE, sep="\t", fill=TRUE, row.names=1, stringsAsFactors=FALSE, na.strings='') Newdf <- df[df$Potential.contaminant != "+", ] 在这个cas中显示了9。然后,我尝试使用以下命令删除+位于该行中的所有行 df &

我有一个数据,我上传到这里

我使用以下命令在R中加载它

df <- read.delim("path to the data", header=TRUE, sep="\t", fill=TRUE, row.names=1, stringsAsFactors=FALSE, na.strings='') 
Newdf <- df[df$Potential.contaminant != "+", ] 
在这个cas中显示了9。然后,我尝试使用以下命令删除+位于该行中的所有行

df <- read.delim("path to the data", header=TRUE, sep="\t", fill=TRUE, row.names=1, stringsAsFactors=FALSE, na.strings='') 
Newdf <- df[df$Potential.contaminant != "+", ] 
上述命令都无法解决此问题。一个想法是潜在的污染物含有钠,这就是原因。我用零替换了所有NA

df[c("Potential.contaminant")][is.na(df[c("Potential.contaminant")])] <- 0

df[c(“潜在污染”)][is.na(df[c(“潜在污染”))]copy将您的要点粘贴到文件
c:/input.txt
中,然后使用您的代码:

df <- read.delim("c:/input.txt", header=TRUE, sep="\t", fill=TRUE, row.names=1, stringsAsFactors=FALSE, na.strings='') 
如果我尝试子集:

> df2 <- df[is.na(df$Potential.contaminant),]
> str(df2)
'data.frame':   12 obs. of  11 variables:
 $ Intensityhenya         : int  NA NA NA NA NA NA NA NA NA NA ...
 $ Only.identified.by.site: chr  NA NA NA NA ...
 $ Reverse                : logi  NA NA NA NA NA NA ...
 $ Potential.contaminant  : chr  NA NA NA NA ...
 $ id                     : int  NA NA NA NA NA NA NA NA NA NA ...
 $ IDs.1                  : chr  NA NA NA NA ...
 $ razor                  : chr  NA NA NA NA ...
 $ Mod.IDs                : chr  NA NA NA NA ...
 $ Evidence.IDs           : chr  NA NA NA NA ...
 $ GHSIDs                 : chr  NA NA NA NA ...
 $ BestGSFD               : chr  NA NA NA NA ...
你的标题很难理解,让我们来看看:

IDs Intensityhenya  Only identified by site Reverse Potential contaminant   id  IDs razor   Mod.IDs Evidence IDs    GHSIDs  BestGSFD
以及一行数据,其中长数据被剪切以获得概览:

CON__A2A4G1 0   +       +   0   16182;[...];4592    True;[..];False 16828;[...];57149   694702;[...];2208697;       
208698;[...];2441826                                            
3;2433194;[...];4682766                                     
在可能的情况下,我会删除多余的数字,保留标签和换行符

我希望您了解这如何以及为什么会导致对数据进行正确的分析,在尝试将输入数据加载到R中之前,对输入数据进行一些检查以对其进行清理

为了便于说明,这里是您的要点,省略号和%T%代替了制表符:

IDs%T%Intensityhenya%T%Only identified by site%T%Reverse%T%Potential contaminant%T%id%T%IDs%T%razor%T%Mod.IDs%T%Evidence IDs%T%GHSIDs%T%BestGSFD
CON__A2A4G1%T%0%T%+%T%%T%+%T%0%T%1618[...]4592%T%Tru[...]alse%T%1682[...]7149%T%69470[...]208697;%T%%T%
20869[...]441826%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
[...]20%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
00[...]%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
1271[...]682766%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
CON__A2A5Y0%T%0%T%%T%%T%+%T%1%T%443[...]5777%T%Fals[...]rue%T%464[...]8377%T%21071[...]489947%T%40503[...]780178%T%40505[...]780175
CON__A2AB72%T%0%T%%T%%T%+%T%2%T%443[...]0447%T%Tru[...]alse%T%464[...]2842%T%21070[...]232341%T%40502[...]250729%T%40502[...]250728
CON__ENSEMBL:ENSBTAP00000014147%T%0%T%%T%%T%+%T%3%T%53270%T%TRUE%T%55779%T%238286[...]382871%T%457377[...]573778%T%4573776
CON__ENSEMBL:ENSBTAP00000024146%T%0%T%%T%%T%+%T%4%T%186[...]5835%T%Tru[...]rue%T%194[...]8438%T%8382[...]492132%T%15455[...]783465%T%15455[...]783465
CON__ENSEMBL:ENSBTAP00000024466;CON__ENSEMBL:ENSBTAP00000024462%T%0%T%%T%%T%+%T%5%T%939[...]5179%T%Tru[...]rue%T%978[...]7757%T%41149[...]468480%T%78212[...]739209%T%78217[...]739209
CON__ENSEMBL:ENSBTAP00000025008%T%0%T%+%T%%T%+%T%6%T%1564[...]8580%T%Fals[...]alse%T%1627[...]9651%T%66672[...]269215%T%125151[...]439696%T%125151[...]439691
CON__ENSEMBL:ENSBTAP00000038253%T%0%T%%T%%T%+%T%7%T%120[...]5703%T%Fals[...]alse%T%125[...]8300%T%5326[...]25602%T%%T%
;125602[...]178%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
1[...]483384%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
22838[...]23247%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
;123247[...]411%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
4[...]7%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
603[...]790126;%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
79012[...]13848%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
;413848[...]765024%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
sp|O43790|KRT86_HUMAN;CON__O43790%T%0%T%%T%%T%+%T%8%T%121[...]5716%T%Tru[...]rue%T%126[...]8315%T%5455[...]484318%T%10404[...]426334%T%

似乎没有标记为污染物的数据行没有值。“NA”是因为在read.delim函数调用期间使用了“NA.strings=''”。例如,如果您这样做:

df <- read.delim("https://gist.githubusercontent.com/anonymous/0bc36ec5f46757de7c2c/raw/517ef70ab6a68e600f57308e045c2b4669a7abfc/example.txt", header=TRUE, row.names=1, sep="\t")
df<-df[df$Potential.contaminant!='+',] 
summary(df)

df尝试使用
grep
df[!grepl(“[+]”,df$Potential.inclutant),]
@akrun同样使用grep。请自己检查一下。当我使用grep时,只有第一列是ok的,但其余的是NA againIt的,因为所有其他元素在'Potential.inclunt'@akrun中都是NA,我明白了,我如何解决这个问题,不把NA从Potential.inclunt中考虑进去。我只想删除名为“potential.inclunt”的列中有+的行。我检查了数据,似乎有两种类型的分隔符,制表符和分号。数据开头还有一些文本似乎不是列名。某些部分包含用
分隔的数字在一个数字中间有一行的结尾。
IDs%T%Intensityhenya%T%Only identified by site%T%Reverse%T%Potential contaminant%T%id%T%IDs%T%razor%T%Mod.IDs%T%Evidence IDs%T%GHSIDs%T%BestGSFD
CON__A2A4G1%T%0%T%+%T%%T%+%T%0%T%1618[...]4592%T%Tru[...]alse%T%1682[...]7149%T%69470[...]208697;%T%%T%
20869[...]441826%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
[...]20%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
00[...]%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
1271[...]682766%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
CON__A2A5Y0%T%0%T%%T%%T%+%T%1%T%443[...]5777%T%Fals[...]rue%T%464[...]8377%T%21071[...]489947%T%40503[...]780178%T%40505[...]780175
CON__A2AB72%T%0%T%%T%%T%+%T%2%T%443[...]0447%T%Tru[...]alse%T%464[...]2842%T%21070[...]232341%T%40502[...]250729%T%40502[...]250728
CON__ENSEMBL:ENSBTAP00000014147%T%0%T%%T%%T%+%T%3%T%53270%T%TRUE%T%55779%T%238286[...]382871%T%457377[...]573778%T%4573776
CON__ENSEMBL:ENSBTAP00000024146%T%0%T%%T%%T%+%T%4%T%186[...]5835%T%Tru[...]rue%T%194[...]8438%T%8382[...]492132%T%15455[...]783465%T%15455[...]783465
CON__ENSEMBL:ENSBTAP00000024466;CON__ENSEMBL:ENSBTAP00000024462%T%0%T%%T%%T%+%T%5%T%939[...]5179%T%Tru[...]rue%T%978[...]7757%T%41149[...]468480%T%78212[...]739209%T%78217[...]739209
CON__ENSEMBL:ENSBTAP00000025008%T%0%T%+%T%%T%+%T%6%T%1564[...]8580%T%Fals[...]alse%T%1627[...]9651%T%66672[...]269215%T%125151[...]439696%T%125151[...]439691
CON__ENSEMBL:ENSBTAP00000038253%T%0%T%%T%%T%+%T%7%T%120[...]5703%T%Fals[...]alse%T%125[...]8300%T%5326[...]25602%T%%T%
;125602[...]178%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
1[...]483384%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
22838[...]23247%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
;123247[...]411%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
4[...]7%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
603[...]790126;%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
79012[...]13848%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
;413848[...]765024%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%%T%
sp|O43790|KRT86_HUMAN;CON__O43790%T%0%T%%T%%T%+%T%8%T%121[...]5716%T%Tru[...]rue%T%126[...]8315%T%5455[...]484318%T%10404[...]426334%T%
df <- read.delim("https://gist.githubusercontent.com/anonymous/0bc36ec5f46757de7c2c/raw/517ef70ab6a68e600f57308e045c2b4669a7abfc/example.txt", header=TRUE, row.names=1, sep="\t")
df<-df[df$Potential.contaminant!='+',] 
summary(df)