Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/324.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 熊猫:如何删除熊猫数据框中所有列的前导缺失值?_Python_Pandas - Fatal编程技术网

Python 熊猫:如何删除熊猫数据框中所有列的前导缺失值?

Python 熊猫:如何删除熊猫数据框中所有列的前导缺失值?,python,pandas,Python,Pandas,使用以下格式的数据帧: A B C ID 1 10 NaN NaN 2 20 NaN NaN 3 28 10.0 NaN 4 32 18.0 10.0 5 34 22.0 16.0 6 34 24.0 20.0 7 34 26.0 21.0 8 34 26.0 22.0 如何删除不同数量的初始缺失值?最初,我想向前填充“new”列的最后一个值,因此我将以以下内容结

使用以下格式的数据帧:

     A     B     C
ID                
1   10   NaN   NaN
2   20   NaN   NaN
3   28  10.0   NaN
4   32  18.0  10.0
5   34  22.0  16.0
6   34  24.0  20.0
7   34  26.0  21.0
8   34  26.0  22.0
如何删除不同数量的初始缺失值?最初,我想向前填充“new”列的最后一个值,因此我将以以下内容结束:

    A     B     C
0  10  10.0  10.0
1  20  18.0  16.0
2  28  22.0  20.0
3  32  24.0  21.0
4  34  26.0  22.0
5  34  26.0  22.0
6  34  26.0  22.0
7  34  26.0  22.0
但我想在剩下的几行上也有NaN也是很自然的:

    A     B     C
0  10  10.0  10.0
1  20  18.0  16.0
2  28  22.0  20.0
3  32  24.0  21.0
4  34  26.0  22.0
5  34  26.0   NaN
6  34   NaN   NaN
7  34   NaN   NaN
以下是该问题的视觉表现:

之前:

import pandas as pd
import numpy as np

# sample dataframe
df = pd.DataFrame({'ID':[1,2,3,4,5,6,7,8],
                    'A': [10,20,28,32,34,34,34,34],
                   'B': [np.nan, np.nan, 10,18,22,24,26,26],
                    'C': [np.nan, np.nan, np.nan,10,16,20,21,22]})
df=df.set_index('ID')

# container for dataframe
# to be built using a for loop
df_new=pd.DataFrame()

for col in df.columns:
    # drop missing values column by column
    ser = df[col]
    original_length = len(ser)
    ser_new = ser.dropna()

    # if leading values are removed for N rows.
    # append last value N times for the last rows
    if len(ser_new) <= original_length:
        N = original_length - len(ser_new)
        ser_append = [ser.iloc[-1]]*N
        #ser_append = [np.nan]*N
        ser_new = ser_new.append(pd.Series(ser_append), ignore_index=True)
    df_new[col]=ser_new

df_new

之后:

import pandas as pd
import numpy as np

# sample dataframe
df = pd.DataFrame({'ID':[1,2,3,4,5,6,7,8],
                    'A': [10,20,28,32,34,34,34,34],
                   'B': [np.nan, np.nan, 10,18,22,24,26,26],
                    'C': [np.nan, np.nan, np.nan,10,16,20,21,22]})
df=df.set_index('ID')

# container for dataframe
# to be built using a for loop
df_new=pd.DataFrame()

for col in df.columns:
    # drop missing values column by column
    ser = df[col]
    original_length = len(ser)
    ser_new = ser.dropna()

    # if leading values are removed for N rows.
    # append last value N times for the last rows
    if len(ser_new) <= original_length:
        N = original_length - len(ser_new)
        ser_append = [ser.iloc[-1]]*N
        #ser_append = [np.nan]*N
        ser_new = ser_new.append(pd.Series(ser_append), ignore_index=True)
    df_new[col]=ser_new

df_new

我提出了一种使用for循环的麻烦方法,在这种方法中,我使用
df.dropna()
删除了前面的NAN,计算我删除的值的数量(N),将最后可用的数量追加N次,并逐列构建一个新的dataframe。但对于较大的数据帧来说,这是相当缓慢的。我觉得这已经是万能熊猫库的内置功能了,但到目前为止我还没有发现任何东西。有没有人建议用一种不那么麻烦的方法来做这件事

使用示例数据集完成代码:

import pandas as pd
import numpy as np

# sample dataframe
df = pd.DataFrame({'ID':[1,2,3,4,5,6,7,8],
                    'A': [10,20,28,32,34,34,34,34],
                   'B': [np.nan, np.nan, 10,18,22,24,26,26],
                    'C': [np.nan, np.nan, np.nan,10,16,20,21,22]})
df=df.set_index('ID')

# container for dataframe
# to be built using a for loop
df_new=pd.DataFrame()

for col in df.columns:
    # drop missing values column by column
    ser = df[col]
    original_length = len(ser)
    ser_new = ser.dropna()

    # if leading values are removed for N rows.
    # append last value N times for the last rows
    if len(ser_new) <= original_length:
        N = original_length - len(ser_new)
        ser_append = [ser.iloc[-1]]*N
        #ser_append = [np.nan]*N
        ser_new = ser_new.append(pd.Series(ser_append), ignore_index=True)
    df_new[col]=ser_new

df_new
将熊猫作为pd导入
将numpy作为np导入
#示例数据帧
df=pd.DataFrame({'ID':[1,2,3,4,5,6,7,8],
‘A’:[10,20,28,32,34,34,34],
‘B’:[np.nan,np.nan,10,18,22,24,26,26],
‘C’:[np.nan,np.nan,np.nan,10,16,20,21,22]})
df=df.set_索引('ID'))
#数据帧容器
#要使用for循环构建
df_new=pd.DataFrame()
对于df.列中的列:
#逐列删除缺少的值
ser=df[col]
原始长度=长度(ser)
seru_new=ser.dropna()
#如果删除N行的前导值。
#为最后一行追加最后一个值N次

如果len(seru new)我们可以利用
shift
并根据缺失值的数量移动每个序列

d = df.isna().sum(axis=0).to_dict() # calculate the number of missing rows per column 

for k,v in d.items():
    df[k] = df[k].shift(-v).ffill()
--


这是一个纯粹的解决方案。使用apply可根据前导NaN的数量上移值,并使用ffill

df.apply(lambda x: x.shift(-x.isna().sum())).ffill()


    A      B       C
1   10  10.0    10.0
2   20  18.0    16.0
3   28  22.0    20.0
4   32  24.0    21.0
5   34  26.0    22.0
6   34  26.0    22.0
7   34  26.0    22.0
8   34  26.0    22.0

你的空值总是在顶部吗?如果序列的底部有一个空值,会发生什么?@datanoveler在我的真实数据中,是的,它们总是在顶部。至少在达到此处描述的数据处理步骤时。我正在用正向填充处理所有其他缺少的值。明白了,请看解决方案,可能有一种方法可以不使用循环来完成此操作,但我认为在执行列操作时,始终需要某种形式的应用。简直太棒了!绝对地在fatct中,这可能是我所能做的最起码的定义。谢谢!我不知道这在一行中是可能的……熊猫非常令人惊奇:)