Python 无法从数据帧中删除NaN行_Python_Pandas

Python 无法从数据帧中删除NaN行

python pandas

Python 无法从数据帧中删除NaN行,python,pandas,Python,Pandas,因此，我试图清理包含一些NaN值的数据帧我尝试了所有建议的方法，但似乎我无法摆脱南 df = pd.read_csv('filename.tsv', delimiter='\t') df = df[pd.notnull(df)] df = df.dropna() df[pd.isnull(df)] # gives our records containing NaN (alot of them.) 我不知道我错过了什么编辑：给出NaN的一个将所有列都作为NaN 还有一些编辑：当我试

因此，我试图清理包含一些NaN值的数据帧

我尝试了所有建议的方法，但似乎我无法摆脱南

df = pd.read_csv('filename.tsv', delimiter='\t')
df = df[pd.notnull(df)]
df = df.dropna()

df[pd.isnull(df)]
# gives our records containing NaN (alot of them.)

我不知道我错过了什么

编辑：给出NaN的一个将所有列都作为NaN

还有一些编辑：当我试着看类型时

heads =  df[df.isnull()].head()
for idx, row in heads.iterrows():
    print idx, type(row.listener_id)

本报税表

0 <type 'float'>
1 <type 'float'>
2 <type 'float'>
3 <type 'float'>
4 <type 'float'>

0
1.
2.
3.
4.

我认为如果需要使用布尔索引：

df = df[~df.isnull().any(axis=1)]

但更好的方法是只使用：

df = df.dropna()

样本：

df = pd.DataFrame({'A':[np.nan,5,4,5,5,np.nan],
                   'B':[7,8,9,4,2,np.nan],
                   'C':[1,3,5,7,1,np.nan],
                   'D':[5,3,6,9,2,np.nan]})

print (df)
     A    B    C    D
0  NaN  7.0  1.0  5.0
1  5.0  8.0  3.0  3.0
2  4.0  9.0  5.0  6.0
3  5.0  4.0  7.0  9.0
4  5.0  2.0  1.0  2.0
5  NaN  NaN  NaN  NaN

可能

NaN

是字符串，那么需要

df.replace（'NaN'，np.NaN）

您可以添加数据样本吗？3、4行？或需要在read_csv-

df[pd.isnull（df）]

中定义自定义

Na

值来检查是否正确（并且

df[pd.notnull（df）]

也不是删除它们的正确方法。

df.dropna

应该可以工作）。请尝试

df.isnull（）.any（）.any（）

@jezrael:添加包含Nayes的数据快照，我正在创建数据示例。给我一些时间，下面两个删除NaN

df=df[df.isnull（）.any（axis=1）]

df=df.dropna（）

df[df.isnull（）].head（）

返回一个空数据帧，从而剔除NaN值

#get True for NaN
print (df.isnull())
       A      B      C      D
0   True  False  False  False
1  False  False  False  False
2  False  False  False  False
3  False  False  False  False
4  False  False  False  False
5   True   True   True   True

#check at least one True per row
print (df.isnull().any(axis=1))
0     True
1    False
2    False
3    False
4    False
5     True
dtype: bool

#boolen indexing with inverting `~` (need select NO NaN rows)
print (df[~df.isnull().any(axis=1)])
     A    B    C    D
1  5.0  8.0  3.0  3.0
2  4.0  9.0  5.0  6.0
3  5.0  4.0  7.0  9.0
4  5.0  2.0  1.0  2.0

#get True for not NaN
print (df.notnull())
       A      B      C      D
0  False   True   True   True
1   True   True   True   True
2   True   True   True   True
3   True   True   True   True
4   True   True   True   True
5  False  False  False  False

#get True if all values per row are True 
print (df.notnull().all(axis=1))
0    False
1     True
2     True
3     True
4     True
5    False
dtype: bool

#boolean indexing
print (df[df.notnull().all(axis=1)])
     A    B    C    D
1  5.0  8.0  3.0  3.0
2  4.0  9.0  5.0  6.0
3  5.0  4.0  7.0  9.0
4  5.0  2.0  1.0  2.0

#simpliest solution
print (df.dropna())
     A    B    C    D
1  5.0  8.0  3.0  3.0
2  4.0  9.0  5.0  6.0
3  5.0  4.0  7.0  9.0
4  5.0  2.0  1.0  2.0