Python 如果有两个以上字段为空，则跳过该行_Python_Pandas_Dataframe

Python 如果有两个以上字段为空，则跳过该行

python pandas dataframe

Python 如果有两个以上字段为空，则跳过该行,python,pandas,dataframe,Python,Pandas,Dataframe,首先，如果列有两个以上的空列，则跳过数据行。在此步骤之后，将过滤掉缺少值的超过2列的行然后，由于某些列仍有1或2列，因此这些列为空。因此，我将用该行的平均值填充空列我可以用下面的代码运行第二步，但是，我不确定如何过滤掉缺少值的超过2列的行我尝试过使用dropna，但它删除了表中的所有列我的代码： import numpy as np import pandas as pd import matplotlib import matplotlib.pyplot as pp %matpl

首先，如果列有两个以上的空列，则跳过数据行。在此步骤之后，将过滤掉缺少值的超过2列的行

然后，由于某些列仍有1或2列，因此这些列为空。因此，我将用该行的平均值填充空列

我可以用下面的代码运行第二步，但是，我不确定如何过滤掉缺少值的超过2列的行

我尝试过使用

dropna

，但它删除了表中的所有列

我的代码：

import numpy as np
import pandas as pd

import matplotlib 
import matplotlib.pyplot as pp

%matplotlib inline

# high technology exports percentage of manufatory exports
hightech_export = pd.read_csv('hightech_export_1.csv') 

#skip the row of data if the columns have more than 2 columns are empty
hightech_export.dropna(axis=1, how='any', thresh=2, subset=None, inplace=False)

# Fill in data with mean value. 
m = hightech_export.mean(axis=1)
for i, col in enumerate(hightech_export):
    hightech_export.iloc[:, i] = hightech_export.iloc[:, i].fillna(m)

我的数据集：

国家名称2001 2002 2003 2004

菲律宾71

马耳他62 58 60 58

新加坡60 56

马来西亚58 57 55

爱尔兰47 41 34

格鲁吉亚38 41 24 38

哥斯达黎加

试试这个

hightech_export.dropna(thresh=2, inplace=True)

代替代码行

hightech_export.dropna(axis=1, how='any', thresh=2, subset=None, inplace=False)

好的，试试这个

import pandas as pd
import numpy as np

data1={'Name':['Tom',np.NaN,'Mary','Jane'],'Age':[20,np.NaN,40,30],'Pay':[np.NaN,np.NaN,20,25]}
data2={'Name':['Tom','Bob','Mary'],'Age':[40,30,20]}

df1=pd.DataFrame.from_records(data1)

检查df

df1

索引为1的记录缺少3个值

替换并使缺少的值为无

df1 = df1.replace({pd.np.nan: None})

现在编写函数来计算每行缺少的值。。。。并创建一个列表

def count_na(lst):
    missing = [n for n in lst if not n]
    return len(missing)

missing_data=[]
for index,n in df1.iterrows():
    missing_data.append(count_na(list(n)))

将此列表用作数据框中的新列

df1['missing']=missing_data

df1应该是这样的

Age     Name    Pay    missing

0 20汤姆无1 1无3 2 40玛丽20 0 3 30 Jane 25 0

所以过滤变得很容易

# Now only take records with <2 missing
df1[df1.missing<2]

#现在只使用记录您可以使用.isnull（）
方法执行第一项任务，如下所示：
替换这个
hightech_export.dropna(axis=1, how='any', thresh=2, subset=None, inplace=False)

与
hightech\u export=hightech\u export.loc[hightech\u export.isnull（）.sum（axis=1）一种简单的方法是在行的基础上比较数据帧的值计数和列数。然后，您可以用数据帧的平均值替换NaN
代码可以是：
result = df.loc[df.apply(lambda x: x.count(), axis=1) >= (len(df.columns) - 2)].replace(
             np.nan, df.agg('mean'))

根据您的示例数据，它给出了预期的结果：
  Country Name  2001   2002       2003  2004
1        Malta  62.0  58.00  60.000000  58.0
2    Singapore  60.0  49.25  39.333333  56.0
3     Malaysia  58.0  57.00  39.333333  55.0
4      Ireland  47.0  41.00  34.000000  34.0
5      Georgia  38.0  41.00  24.000000  38.0

您应该澄清您的问题，但解决方案是IF语句。我刚刚编辑了我的问题，现在清楚了吗？是否有必要使用IF语句？我已尝试使用“”df1.dropna（thresh=2）”，但它不起作用。
result = df.loc[df.apply(lambda x: x.count(), axis=1) >= (len(df.columns) - 2)].replace(
             np.nan, df.agg('mean'))

  Country Name  2001   2002       2003  2004
1        Malta  62.0  58.00  60.000000  58.0
2    Singapore  60.0  49.25  39.333333  56.0
3     Malaysia  58.0  57.00  39.333333  55.0
4      Ireland  47.0  41.00  34.000000  34.0
5      Georgia  38.0  41.00  24.000000  38.0