Python 熊猫：必须仅使用as asfreq传递带有布尔值的数据帧_Python_Pandas_Dataframe_Missing Data_Fillna

Python 熊猫：必须仅使用as asfreq传递带有布尔值的数据帧

python pandas dataframe

Python 熊猫：必须仅使用as asfreq传递带有布尔值的数据帧,python,pandas,dataframe,missing-data,fillna,Python,Pandas,Dataframe,Missing Data,Fillna,我有下面的代码，它给了我一个非常奇怪的错误，我的目标是用不同的标签为数据补齐缺失的值。如果我更改df_filled=df.asfreq（freq='D'）.fillna（method='bfill'，limit=1）.dropna（how='all'）。drop_duplicates（keep='last'）一切正常，但是使用freq=2D，df_filled[is_filled]没有布尔形式 from datetime import datetime, timedelta im

我有下面的代码，它给了我一个非常奇怪的错误，我的目标是用不同的标签为数据补齐缺失的值。如果我更改

df_filled=df.asfreq（freq='D'）.fillna（method='bfill'，limit=1）.dropna（how='all'）。drop_duplicates（keep='last'）

一切正常，但是使用freq=2D，df_filled[is_filled]没有布尔形式

    from datetime import datetime, timedelta
    import pandas as pd
    import numpy as np
    import random
    ##Generate the Data
    np.random.seed(11) 
    date_today = datetime.now()
    ndays = 15
    df = pd.DataFrame({'date': [date_today + timedelta(days=(abs(np.random.randn(1))*2)[0]*x) for x in range(ndays)], 
                       'test': pd.Series(np.random.randn(ndays)),     'test2':pd.Series(np.random.randn(ndays))})
    df1=pd.DataFrame({'date': [date_today + timedelta(hours=x) for x in range(ndays)], 
                       'test': pd.Series(np.random.randn(ndays)),     'test2':pd.Series(np.random.randn(ndays))})
    df2=pd.DataFrame({'date': [date_today + timedelta(days=x)-timedelta(seconds=100*x) for x in range(ndays)], 
                       'test': pd.Series(np.random.randn(ndays)),     'test2':pd.Series(np.random.randn(ndays))})
    df=df.append(df1)
    df=df.append(df2)
    df = df.set_index('date').sort_index()
    df = df.mask(np.random.random(df.shape) < .7)
    df=df.reset_index()
    df['test']=df['test'].astype(str)
    df['test2']=df['test2'].astype(str)
    df.replace('nan', np.nan, inplace = True)
    ##

    df.set_index(df['date'].dt.date, inplace = True) 

    df = df[~df.index.duplicated(keep='first')]
    df_filled=df.asfreq(freq='2D').fillna(method='bfill', limit=2).dropna(how='all').drop_duplicates(keep='last')
    df_filled.set_index(df_filled['date'],inplace=True)
    df_filled=df_filled.drop('date',1)
    df.set_index(df['date'],inplace=True)
    df=df.drop('date',1)
    is_filled = (df.isnull() & df_filled.notnull()) | df.notnull() 
    df_filled[is_filled] ## error happens here
    df_filled[is_filled]=df_filled[is_filled].applymap(lambda x: '_2D' if pd.notnull(x)  else np.nan)

从datetime导入datetime，timedelta
作为pd进口熊猫
将numpy作为np导入
随机输入
##生成数据
np.随机种子（11）
date_today=datetime.now（）
星期五=15
df=pd.DataFrame（{'date'：[date_today+timedelta（days=（abs（np.random.randn（1））*2）[0]*x）表示范围内的x（ndays）]，
“测试”：pd.Series（np.random.randn（ndays）），“测试2”：pd.Series（np.random.randn（ndays））}）
df1=pd.DataFrame（{'date'：[date_today+timedelta（hours=x）表示范围内的x（ndays）]，
“测试”：pd.Series（np.random.randn（ndays）），“测试2”：pd.Series（np.random.randn（ndays））}）
df2=pd.DataFrame（{'date'：[date_today+timedelta（days=x）-timedelta（seconds=100*x）表示范围内的x（ndays）]，
“测试”：pd.Series（np.random.randn（ndays）），“测试2”：pd.Series（np.random.randn（ndays））}）
df=df.append（df1）
df=df.append（df2）
df=df.set_index（'date'）。sort_index（）
df=df.mask（np.random.random（df.shape）<.7）
df=df.reset_index（）
df['test']=df['test'].astype（str）
df['test2']=df['test2'].astype（str）
df.replace（'nan'，np.nan，inplace=True）
##
df.set_索引（df['date'].dt.date，inplace=True）
df=df[~df.index.duplicated（keep='first'）]
df_filled=df.asfreq（freq='2D'）.fillna（method='bfill'，limit=2）.dropna（how='all'）.drop_duplicates（keep='last'））
df_filled.set_索引（df_filled['date']，inplace=True）
df_filled=df_filled.drop（'date'，1）
df.set_索引（df['date']，inplace=True）
df=df.drop（'date'，1）
is_filled=（df.isnull（）&df_filled.notnull（））| df.notnull（）
df_filled[is_filled]##这里发生错误
df_filled[is_filled]=df_filled[is_filled].applymap（lambda x:''u 2D'如果pd.notnull（x）否则为np.nan）

输出：

ValueError:必须只传递带有布尔值的数据帧

我提前感谢您的帮助。

如果您

print（is_filled=（df.isnull（）&df_filled.notnull（））| df.notnull（））

那么您将看到

True

和

NaN

的混合。因此，解决方案是将

NaN

值替换为

False

：

下面的代码片段：

df=df.drop('date',1)
is_filled = (df.isnull() & df_filled.notnull()) | df.notnull() 
is_filled = is_filled.fillna(False) # Fix here
df_filled[is_filled]=df_filled[is_filled].applymap(lambda x: '_2D' if pd.notnull(x)  else np.nan)

你可以在

is_filled=（df.isnull（）&df_filled.notnull（））| df.notnull（）

之后使用

is_filled=is_filled.fillna（False）

来解决这个问题，但是我真的需要考虑如何相应地编辑这个问题，或者你应该把它缩小到那个特定的部分；您的代码是可复制的，但从其复杂性来看，我不确定您是如何不亲自调试的。这是否给了您预期的输出？@roganjosh谢谢，您的解决方案解决了这个问题。代码的顶部部分仅用于生成数据集。我对python还比较陌生，在某些情况下仍然很难跟踪错误消息到问题的根本原因。不客气。FWIW，我想你可能有一个复杂的方法来获得你想要的输出，但我现在没有时间去研究它；但有一件事是肯定的，使用空格不会受到惩罚，因此我建议您在

和

，

等之前和之后使用空格。请看，这对理解非常有帮助。不要压碎代码。