Python 避免删除空值以运行函数_Python_Pandas

Python 避免删除空值以运行函数

python pandas

Python 避免删除空值以运行函数,python,pandas,Python,Pandas,我有以下异常检测功能： id = np.linspace(1,200,200) days = [350.0, 641.0, 389.0, 130.0, 344.0, 92.0, 392.0, 51.0, 28.0, 358.0, 309.0, 64.0, 380.0, 491.0, 332.0, 410.0, 66.0, 435.0, 156.0, 294.0, 75.0, 284.0, 105.0, 34.0, 50.0, 155.0, 427.0, 32

我有以下异常检测功能：

id = np.linspace(1,200,200)

days = [350.0, 641.0, 389.0, 130.0, 344.0, 92.0, 392.0, 51.0, 28.0, 358.0, 
        309.0, 64.0, 380.0, 491.0, 332.0, 410.0, 66.0, 435.0, 156.0, 294.0, 
        75.0, 284.0, 105.0, 34.0, 50.0, 155.0, 427.0, 327.0, 116.0, 97.0, 
        274.0, 315.0, 99.0, 70.0, 62.0, 241.0, 397.0, 50.0, 41.0, 231.0, 
        238.0, 216.0, 105.0, 36.0, 192.0, 38.0, 122.0, 37.0, 236.0, 175.0, 
        138.0, 146.0, 125.0, 144.0, 166.0, 19.0, 155.0, 130.0, 54.0, 120.0, 
        65.0, 95.0, 158.0, 92.0, 65.0, 52.0, 91.0, 67.0, 38.0, 72.0, 36.0, 
        14.0, 74.0, 155.0, 503.0, 110.0, 338.0, 444.0, 408.0, 107.0, 214.0, 
        291.0, 91.0, 277.0, 96.0, 325.0, 154.0, 314.0, 377.0, 147.0, 48.0, 
        224.0, 75.0, 268.0, 135.0, 177.0, 133.0, 306.0, 187.0, 145.0, 353.0, 
        148.0, 182.0, 95.0, 82.0, None, 143.0, 79.0, 168.0, 141.0, 224.0, 82.0,
        202.0, 107.0, 169.0, 153.0, 156.0, 79.0, 49.0, 126.0, 44.0, 67.0, 64.0, 
        102.0, 74.0, 56.0, 102.0, 285.0, 386.0, 176.0, 106.0, 6.0, 322.0, 72.0, 
        192.0, 429.0, 101.0, 159.0, 168.0, 319.0, 178.0, 323.0, 295.0, 151.0, 
        286.0, 93.0, 336.0, 252.0, 111.0, 49.0, 113.0, 214.0, 230.0, 77.0,
        192.0, 219.0, 166.0, 72.0, 143.0, 166.0, 140.0, 191.0, 113.0, 83.0, 
        41.0, 28.0, 84.0, 78.0, 28.0, 202.0, 223.0, 188.0, 238.0, 212.0, 133.0, 77.0,
        235.0, 212.0, 243.0, 176.0, 167.0, 69.0, 108.0, 11.0, 35.0, 63.0, 38.0, 445.0,
        111.0, 135.0, 143.0, 70.0, 143.0, 77.0, 22.0, 222.0, 444.0, 321.0, 1.0, 234.0]

df = pd.DataFrame(
    {'ids': id,
     'days': days
    })

def get_bounds(df, serie): 
    quartile_1, quartile_3 = np.percentile(df[serie], [25, 75]) 
    iqr = quartile_3 - quartile_1 
    lower_bound = quartile_1 - (iqr * 1.5) 
    upper_bound = quartile_3 + (iqr * 1.5) 
    return lower_bound, upper_bound 

lower_bound, upper_bound = get_bounds(df,'days') #####!
print(upper_bound)
df = df.loc[df['days'] < upper_bound].sort_values('days') #remove outliers
print(df)

id=np.linspace（1200200）
天数=[350.0641.0389.0130.0344.0,92.0392.0,392.0,51.0,28.0358.0,，
309.0, 64.0, 380.0, 491.0, 332.0, 410.0, 66.0, 435.0, 156.0, 294.0, 
75.0, 284.0, 105.0, 34.0, 50.0, 155.0, 427.0, 327.0, 116.0, 97.0, 
274.0, 315.0, 99.0, 70.0, 62.0, 241.0, 397.0, 50.0, 41.0, 231.0, 
238.0, 216.0, 105.0, 36.0, 192.0, 38.0, 122.0, 37.0, 236.0, 175.0, 
138.0, 146.0, 125.0, 144.0, 166.0, 19.0, 155.0, 130.0, 54.0, 120.0, 
65.0, 95.0, 158.0, 92.0, 65.0, 52.0, 91.0, 67.0, 38.0, 72.0, 36.0, 
14.0, 74.0, 155.0, 503.0, 110.0, 338.0, 444.0, 408.0, 107.0, 214.0, 
291.0, 91.0, 277.0, 96.0, 325.0, 154.0, 314.0, 377.0, 147.0, 48.0, 
224.0, 75.0, 268.0, 135.0, 177.0, 133.0, 306.0, 187.0, 145.0, 353.0, 
148.0、182.0、95.0、82.0、无、143.0、79.0、168.0、141.0、224.0、82.0、，
202.0, 107.0, 169.0, 153.0, 156.0, 79.0, 49.0, 126.0, 44.0, 67.0, 64.0, 
102.0, 74.0, 56.0, 102.0, 285.0, 386.0, 176.0, 106.0, 6.0, 322.0, 72.0, 
192.0, 429.0, 101.0, 159.0, 168.0, 319.0, 178.0, 323.0, 295.0, 151.0, 
286.0, 93.0, 336.0, 252.0, 111.0, 49.0, 113.0, 214.0, 230.0, 77.0,
192.0, 219.0, 166.0, 72.0, 143.0, 166.0, 140.0, 191.0, 113.0, 83.0, 
41.0, 28.0, 84.0, 78.0, 28.0, 202.0, 223.0, 188.0, 238.0, 212.0, 133.0, 77.0,
235.0, 212.0, 243.0, 176.0, 167.0, 69.0, 108.0, 11.0, 35.0, 63.0, 38.0, 445.0,
111.0, 135.0, 143.0, 70.0, 143.0, 77.0, 22.0, 222.0, 444.0, 321.0, 1.0, 234.0]
df=pd.DataFrame(
{'id'：id，
“天”：天
})
def get_界限（df，系列）：
四分位_1，四分位_3=np百分位（df[系列]，[25,75]）
iqr=四分位_3-四分位_1
下限=四分位数（iqr*1.5）
上限=四分位数3+（iqr*1.5）
返回下界，上界
下界，上界=获取上界（df，'days'）！
打印（上限）
df=df.loc[df['days']<上限].对值进行排序（“days”）#删除异常值
打印（df）

但是，如果我使用

更改行，则会引发错误至：
lower\u-bound，upper\u-bound=get\u-bounds（df.dropna（子集=['days']），'days'）
然后它就可以正常运行了
但是，一些引用df的函数需要空值，为了正确运行异常值定义，我不得不删除这些空值。您能否帮助更改它，使其不强制我删除空值以运行该函数？
使用。这会忽略nan
值，而采用百分位数。因此，自定义函数中的代码应为：
quartile_1, quartile_3 = np.nanpercentile(df[serie], [25, 75])

熊猫数据帧有自己版本的numpy.percentile
，可以优雅地处理NaN值。用它来代替
quartile_1, quartile_3 = df[serie].quantile([0.25, 0.75])

文档中的右键：
返回请求轴上给定分位数处的值，a lanumpy.percentile