Python 从多索引数据帧中删除不完整的季节（熊猫）_Python_Pandas

Python 从多索引数据帧中删除不完整的季节（熊猫）

python pandas

Python 从多索引数据帧中删除不完整的季节（熊猫）,python,pandas,Python,Pandas,尝试将方法从应用到多索引数据帧，似乎不起作用以数据帧为例： import pandas as pd import numpy as np dates = pd.date_range('20070101',periods=3200) df = pd.DataFrame(data=np.random.randint(0,100,(3200,1)), columns =list('A')) df['A'][5,6,7, 8, 9, 10, 11, 12, 13] = np.nan #add mis

尝试将方法从应用到多索引数据帧，似乎不起作用

以数据帧为例：

import pandas as pd
import numpy as np

dates = pd.date_range('20070101',periods=3200)
df = pd.DataFrame(data=np.random.randint(0,100,(3200,1)), columns =list('A'))
df['A'][5,6,7, 8, 9, 10, 11, 12, 13] = np.nan #add missing data points
df['date'] = dates
df = df[['date','A']]

将季节函数应用于日期时间索引

def get_season(row):
    if row['date'].month >= 3 and row['date'].month <= 5:
        return '2'
    elif row['date'].month >= 6 and row['date'].month <= 8:
        return '3'
    elif row['date'].month >= 9 and row['date'].month <= 11:
        return '4'
    else:
        return '1'

创建用于索引的“年份”列

df['Year'] = df['date'].dt.year

按年份和季节划分的多指标

df = df.set_index(['Year', 'Season'], inplace=False)

统计每个季节的数据点

count = df.groupby(level=[0, 1]).count()

减少少于75天的季节

count = count.drop(count[count.A < 75].index)

使用isin函数对所有内容都显示为false，而我希望它选择“A”中有效数据超过75天的所有季节

df = df.isin(complete)
df

每个值都是假的，我不明白为什么

我希望这是足够简洁，我需要这个工作在一个多索引使用季节，所以我包括它

编辑

另一种基于多索引重新索引的方法无法从

编辑2

我也试过这个

seasons = count[count['A'] >= 75].index

df = df[df['A'].isin(seasons)]

同样，空白输出

我认为您可以使用：

我觉得“雨”应该是“A”？我猜你想做这样的事

count=df[df.A>75].groupby（级别=[0,1]）.count（）

。这使您的天数超过75天。在此之后，我怀疑您想使用合并或加入，而不是isin。@约翰：是的，应该是“A”-很抱歉。@约翰：对不起，我想我还不够清楚。我不想计算值大于75的天数-我想计算每个季节的天数，如果每个季节的天数超过75天，我想保留它。如果每个季节少于75天，我想删除它。谢谢，这用'True'和'False'标识了正确的变量（第一个季节是'False'，因为它有很多缺失值）。但是当我使用

print idx[df]

应用它时，它只返回整个数据帧，并且没有忽略少于75天的季节。嗯，如果我

print df

它有3200行，如果

print df[idx]

它有3106行。因此，我认为删除了94行。但我不知道怎样才能更好地检查它。你觉得怎么样？对不起，我没有重新启动内核，其中一个变量出现了故障，这确实有效，谢谢！

df = df.isin(complete)
df

df3 = df.reset_index().groupby('Year').apply(lambda x: x.set_index('Season').reindex(count,method='pad'))

seasons = count[count['A'] >= 75].index

df = df[df['A'].isin(seasons)]

complete = count[count['A'] >= 75].index 


idx = df.index.isin(complete)
print idx
[ True  True  True ..., False False False]

print df[idx]

                 date     A
Year Season                 
2007 1      2007-01-01  24.0
     1      2007-01-02  92.0
     1      2007-01-03  54.0
     1      2007-01-04  91.0
     1      2007-01-05  91.0
     1      2007-01-06   NaN
     1      2007-01-07   NaN
     1      2007-01-08   NaN
     1      2007-01-09   NaN
     1      2007-01-10   NaN
     1      2007-01-11   NaN
     1      2007-01-12   NaN
     1      2007-01-13   NaN
     1      2007-01-14   NaN
     1      2007-01-15  18.0
     1      2007-01-16  82.0
     1      2007-01-17  55.0
     1      2007-01-18  64.0
     1      2007-01-19  89.0
     1      2007-01-20  37.0
     1      2007-01-21  45.0
     1      2007-01-22   4.0
     1      2007-01-23  34.0
     1      2007-01-24  35.0
     1      2007-01-25  90.0
     1      2007-01-26  17.0
     1      2007-01-27  29.0
     1      2007-01-28  58.0
     1      2007-01-29   7.0
     1      2007-01-30  57.0
...                ...   ...
2015 3      2015-08-02  42.0
     3      2015-08-03   0.0
     3      2015-08-04  31.0
     3      2015-08-05  39.0
     3      2015-08-06  25.0
     3      2015-08-07   1.0
     3      2015-08-08   7.0
     3      2015-08-09  97.0
     3      2015-08-10  38.0
     3      2015-08-11  59.0
     3      2015-08-12  28.0
     3      2015-08-13  84.0
     3      2015-08-14  43.0
     3      2015-08-15  63.0
     3      2015-08-16  68.0
     3      2015-08-17   0.0
     3      2015-08-18  19.0
     3      2015-08-19  61.0
     3      2015-08-20  11.0
     3      2015-08-21  84.0
     3      2015-08-22  75.0
     3      2015-08-23  37.0
     3      2015-08-24  40.0
     3      2015-08-25  66.0
     3      2015-08-26  50.0
     3      2015-08-27  74.0
     3      2015-08-28  37.0
     3      2015-08-29  19.0
     3      2015-08-30  25.0
     3      2015-08-31  15.0

[3106 rows x 2 columns]