Python 返回具有多个';NA';价值

Python 返回具有多个';NA';价值,python,python-3.x,python-2.7,pandas,sklearn-pandas,Python,Python 3.x,Python 2.7,Pandas,Sklearn Pandas,我的代码: import pandas as pd from sklearn.preprocessing import LabelEncoder column_names = ["age","workclass","fnlwgt","education","education-num","marital-status","occupation","relationship","race","sex","capital-gain","capital-loss","hrs-per-week","na

我的代码:

import pandas as pd
from sklearn.preprocessing import LabelEncoder
column_names = ["age","workclass","fnlwgt","education","education-num","marital-status","occupation","relationship","race","sex","capital-gain","capital-loss","hrs-per-week","native-country","income"]

adult_train = pd.read_csv("adult.data",header=None,sep=',\s',na_values=["?"])
adult_train.columns=column_names
adult_train.fillna('NA',inplace=True)
我想要在多个列中具有值“NA”的行的索引。是否有内置方法,或者我必须逐行迭代并检查每列的值? 以下是数据的快照:

我想要像398409这样的行的索引(B列和G列缺少值),而不是像394这样的行的索引(仅N列缺少值)

使用
isnull。任何(1)
sum
获取布尔掩码,然后选择行以获取索引,即

df = pd.DataFrame({'A':[1,2,3,4,5],
               'B' :[np.nan,4,5,np.nan,8],
               'C' :[2,4,np.nan,3,5],
               'D' :[np.nan,np.nan,np.nan,np.nan,5]})

   A    B    C    D
0  1  NaN  2.0  NaN
1  2  4.0  4.0  NaN
2  3  5.0  NaN  NaN
3  4  NaN  3.0  NaN
4  5  8.0  5.0  5.0

# If you want to select rows with nan value from Columns B and C 
df.loc[df[['B','C']].isnull().any(1)].index
Int64Index([0, 2, 3], dtype='int64')

# If you want to rows with more than one nan then
df.loc[df.isnull().sum(1)>1].index
Int64Index([0, 2, 3], dtype='int64')
使用
isnull.any(1)
sum
获取布尔掩码,然后选择行以获取索引,即

df = pd.DataFrame({'A':[1,2,3,4,5],
               'B' :[np.nan,4,5,np.nan,8],
               'C' :[2,4,np.nan,3,5],
               'D' :[np.nan,np.nan,np.nan,np.nan,5]})

   A    B    C    D
0  1  NaN  2.0  NaN
1  2  4.0  4.0  NaN
2  3  5.0  NaN  NaN
3  4  NaN  3.0  NaN
4  5  8.0  5.0  5.0

# If you want to select rows with nan value from Columns B and C 
df.loc[df[['B','C']].isnull().any(1)].index
Int64Index([0, 2, 3], dtype='int64')

# If you want to rows with more than one nan then
df.loc[df.isnull().sum(1)>1].index
Int64Index([0, 2, 3], dtype='int64')

请参阅,我们需要数据和预期输出
maintal\u train.loc[maintal\u train.isnull().sum(axis=1)>1]。索引
这可能会有帮助,删除此
maintal\u train.fillna('NA',inplace=True)
这是低效的。我正在使用
maintal\u train.fillna('NA',inplace=True)
,以便我可以使用
maintal\u train\u['column name'。值计数()
要获取该列中错误值的计数,只需使用
value\u计数(dropna=False)
。如果将其替换为字符串,您将错过大部分缺少值的功能。如果您的case
maintal\u train.loc[maintal\u train.isnull().sum(axis=1)>1]。索引
这就是您的全部目标。这不管用吗?请看,我们需要数据和预期输出
maintal\u train.loc[maintal\u train.isnull().sum(axis=1)>1]。索引
这可能会有帮助,删除这个
maintal\u train.fillna('NA',inplace=True)
这是低效的。我正在使用
maintal\u train.fillna('NA',inplace=True)
,以便我可以使用
maintal\u train\u['column name'].value\u counts()
要获取该列中错误值的计数,只需使用
value\u counts(dropna=False)
。如果将其替换为字符串,您将错过大部分缺少值的功能。如果您的case
maintal\u train.loc[maintal\u train.isnull().sum(axis=1)>1]。索引
这就是您的全部目标。这不管用吗?