Python df.fillna（）不替换na值_Python_Pandas

Python df.fillna（）不替换na值

python pandas

Python df.fillna（）不替换na值,python,pandas,Python,Pandas,我有一个数据框，看起来像这样（为了清楚起见：这表示一个有5行8列的df）： BTC-USD_close BTC-USD_volume LTC-USD_close LTC-USD_volume \ time 1528968660 6489.549805 0.587100 96.580002

我有一个数据框，看起来像这样（为了清楚起见：这表示一个有5行8列的df）：

            BTC-USD_close  BTC-USD_volume  LTC-USD_close  LTC-USD_volume  \
time                                                                       
1528968660    6489.549805        0.587100      96.580002        9.647200   
1528968720    6487.379883        7.706374      96.660004      314.387024   
1528968780    6479.410156        3.088252      96.570000       77.129799   
1528968840    6479.410156        1.404100      96.500000        7.216067   
1528968900    6479.979980        0.753000      96.389999      524.539978  

            BCH-USD_close  BCH-USD_volume  ETH-USD_close  ETH-USD_volume  
time                                                                      
1528968660     871.719971        5.675361            NaN             NaN  
1528968720     870.859985       26.856577      486.01001       26.019083  
1528968780     870.099976        1.124300      486.00000        8.449400  
1528968840     870.789978        1.749862      485.75000       26.994646  
1528968900     870.000000        1.680500      486.00000       77.355759

我想替换ETH-USD_close和ETH-USD_volume列中的nan值。然而，当我调用

df.fillna（method='ffill'，inplace=True）

时，似乎什么都没有发生；当我使用调试器单步执行程序时，缺少的值仍然存在，列中没有任何更改

当我使用

df.isna（）

检查pandas是否正确解释了我的nan值时，情况似乎确实如此；通过

print（df.isna（））

检查前几行的输出：

像

df.dropna（inplace=True）

这样的调用会删除整行，但这不是我想要的。有什么建议吗

编辑：如果有人想重现问题，可以从中下载数据，解压缩数据并在同一目录中运行以下代码：

import pandas as pd

#Initialize empty df
main_df = pd.DataFrame()

ratios = ["BTC-USD", "LTC-USD", "BCH-USD", "ETH-USD"]
for ratio in ratios:
    #SET CORRECT PATH HERE
    dataset = f'crypto_data/{ratio}.csv'
    #Use f-strings so we know which close/volume is which
    df_ratio = pd.read_csv(dataset, names=['time', 'low', 'high', 'open', f"{ratio}_close", f"{ratio}_volume"])
    #Set time as index so we can join them on this shared time
    df_ratio.set_index("time", inplace=True)

    #ignore the other columns besides price and volume
    df_ratio = df_ratio[[f"{ratio}_close", f"{ratio}_volume"]]

    if main_df.empty:
        main_df = df_ratio
    else:
        main_df = main_df.join(df_ratio)


main_df.fillna(method='ffill', inplace=True) #THIS DOESN'T SEEM TO WORK

啊

如果值是序列的第一个值，则不能

ffill

NaN

值：它没有以前的值

使用

.ffill（）.bfill（）

可以解决此问题，但可能会创建错误数据。

您能否共享有关数据帧的一些信息，例如

df.descripe（）

和

df.info（）

？甚至是一些虚拟数据？另外，请不要发布这样的截图，这是没有用的。而是复制粘贴

print（df）

的输出或csv文件的某个摘录。

df.fillna（method='ffill'，inplace='True'）

对我有效。这太奇怪了。我可以编辑步骤，将问题复制到由系列（列）组成的问题中。感谢帮助=）感谢。我就用bfill。这样我就可以从中学到一些东西；pandas dataframe由系列（列）和按列应用fillna（）的操作组成？而ffill将上一个有效观察值向前传播到下一个（这就是为什么我的代码没有工作，因为没有“有效的最后一个”），而bfill使用下一个有效观察值来填补空白？让我疑惑的是，如果一个系列的第一个和最后一个观测值丢失了，该怎么办？lolYes的确，默认情况下，数据帧会按列应用其函数。请注意，

.bfill（）

表示回填，即向前填充，然后从另一个方向填充。在像您这样的时间序列数据集中，这可能会创建错误的数据，因为您从“未来”获取数据并用它填充上一个时间单位。您在时间序列数据集中替换缺失的策略是什么？我只会使用

ffill

，然后在必要时使用

dropna

。

import pandas as pd

#Initialize empty df
main_df = pd.DataFrame()

ratios = ["BTC-USD", "LTC-USD", "BCH-USD", "ETH-USD"]
for ratio in ratios:
    #SET CORRECT PATH HERE
    dataset = f'crypto_data/{ratio}.csv'
    #Use f-strings so we know which close/volume is which
    df_ratio = pd.read_csv(dataset, names=['time', 'low', 'high', 'open', f"{ratio}_close", f"{ratio}_volume"])
    #Set time as index so we can join them on this shared time
    df_ratio.set_index("time", inplace=True)

    #ignore the other columns besides price and volume
    df_ratio = df_ratio[[f"{ratio}_close", f"{ratio}_volume"]]

    if main_df.empty:
        main_df = df_ratio
    else:
        main_df = main_df.join(df_ratio)


main_df.fillna(method='ffill', inplace=True) #THIS DOESN'T SEEM TO WORK