Python 按列值不为零对数据范围进行分组_Python_Pandas

Python 按列值不为零对数据范围进行分组

python pandas

Python 按列值不为零对数据范围进行分组,python,pandas,Python,Pandas,我有以下数据帧 count 2015-09-28 2 2015-09-29 2 2015-09-30 0 2015-10-01 2 2015-10-02 3 2015-10-05 2 2015-10-06 1 2015-10-07 0 2015-10-08 1 我想按日期分隔的数据范围进行分组，计数=0。我想要这样的东西 first last totalc

我有以下数据帧

            count
2015-09-28      2
2015-09-29      2
2015-09-30      0
2015-10-01      2
2015-10-02      3
2015-10-05      2
2015-10-06      1
2015-10-07      0
2015-10-08      1

我想按日期分隔的数据范围进行分组，计数=0。我想要这样的东西

  first      last       totalcount
1 2015-09-28 2015-09-29 4
2 2015-10-01 2015-10-06 8
3 2015-10-08 2015-10-08 1

用于将每行与组号关联：

In [134]: df['groupno'] = (df['count'] == 0).cumsum()

In [135]: df
Out[135]: 
            count  groupno
2015-09-28      2        0
2015-09-29      2        0
2015-09-30      0        1
2015-10-01      2        1
2015-10-02      3        1
2015-10-05      2        1
2015-10-06      1        1
2015-10-07      0        2
2015-10-08      1        2

然后，您可以使用以获得所需的结果：

import pandas as pd
df = pd.DataFrame({'count': [2, 2, 0, 2, 3, 2, 1, 0, 1]},
                  index=[u'2015-09-28', u'2015-09-29', u'2015-09-30', u'2015-10-01',
                         u'2015-10-02', u'2015-10-05', u'2015-10-06', u'2015-10-07',
                         u'2015-10-08'])


mask = (df['count'] == 0)
df['groupno'] = mask.cumsum()
# Remove the rows where the count is 0
df = df.loc[~mask]
# Make the index a column so we can use 'index':['first', 'last'] to find the
# first and last date in each group.
df = df.reset_index()
result = df.groupby('groupno').agg({'index':['first', 'last'], 'count':'sum'})
result.columns = result.columns.droplevel(0)
result = result.rename(columns={'sum':'totalcount'})

屈服

         totalcount       first        last
groupno                                    
0                 4  2015-09-28  2015-09-29
1                 8  2015-10-01  2015-10-06
2                 1  2015-10-08  2015-10-08

用于将每行与组号关联：

In [134]: df['groupno'] = (df['count'] == 0).cumsum()

In [135]: df
Out[135]: 
            count  groupno
2015-09-28      2        0
2015-09-29      2        0
2015-09-30      0        1
2015-10-01      2        1
2015-10-02      3        1
2015-10-05      2        1
2015-10-06      1        1
2015-10-07      0        2
2015-10-08      1        2

然后，您可以使用以获得所需的结果：

import pandas as pd
df = pd.DataFrame({'count': [2, 2, 0, 2, 3, 2, 1, 0, 1]},
                  index=[u'2015-09-28', u'2015-09-29', u'2015-09-30', u'2015-10-01',
                         u'2015-10-02', u'2015-10-05', u'2015-10-06', u'2015-10-07',
                         u'2015-10-08'])


mask = (df['count'] == 0)
df['groupno'] = mask.cumsum()
# Remove the rows where the count is 0
df = df.loc[~mask]
# Make the index a column so we can use 'index':['first', 'last'] to find the
# first and last date in each group.
df = df.reset_index()
result = df.groupby('groupno').agg({'index':['first', 'last'], 'count':'sum'})
result.columns = result.columns.droplevel(0)
result = result.rename(columns={'sum':'totalcount'})

屈服

         totalcount       first        last
groupno                                    
0                 4  2015-09-28  2015-09-29
1                 8  2015-10-01  2015-10-06
2                 1  2015-10-08  2015-10-08

用于将每行与组号关联：

In [134]: df['groupno'] = (df['count'] == 0).cumsum()

In [135]: df
Out[135]: 
            count  groupno
2015-09-28      2        0
2015-09-29      2        0
2015-09-30      0        1
2015-10-01      2        1
2015-10-02      3        1
2015-10-05      2        1
2015-10-06      1        1
2015-10-07      0        2
2015-10-08      1        2

然后，您可以使用以获得所需的结果：

import pandas as pd
df = pd.DataFrame({'count': [2, 2, 0, 2, 3, 2, 1, 0, 1]},
                  index=[u'2015-09-28', u'2015-09-29', u'2015-09-30', u'2015-10-01',
                         u'2015-10-02', u'2015-10-05', u'2015-10-06', u'2015-10-07',
                         u'2015-10-08'])


mask = (df['count'] == 0)
df['groupno'] = mask.cumsum()
# Remove the rows where the count is 0
df = df.loc[~mask]
# Make the index a column so we can use 'index':['first', 'last'] to find the
# first and last date in each group.
df = df.reset_index()
result = df.groupby('groupno').agg({'index':['first', 'last'], 'count':'sum'})
result.columns = result.columns.droplevel(0)
result = result.rename(columns={'sum':'totalcount'})

屈服

         totalcount       first        last
groupno                                    
0                 4  2015-09-28  2015-09-29
1                 8  2015-10-01  2015-10-06
2                 1  2015-10-08  2015-10-08

用于将每行与组号关联：

In [134]: df['groupno'] = (df['count'] == 0).cumsum()

In [135]: df
Out[135]: 
            count  groupno
2015-09-28      2        0
2015-09-29      2        0
2015-09-30      0        1
2015-10-01      2        1
2015-10-02      3        1
2015-10-05      2        1
2015-10-06      1        1
2015-10-07      0        2
2015-10-08      1        2

然后，您可以使用以获得所需的结果：

import pandas as pd
df = pd.DataFrame({'count': [2, 2, 0, 2, 3, 2, 1, 0, 1]},
                  index=[u'2015-09-28', u'2015-09-29', u'2015-09-30', u'2015-10-01',
                         u'2015-10-02', u'2015-10-05', u'2015-10-06', u'2015-10-07',
                         u'2015-10-08'])


mask = (df['count'] == 0)
df['groupno'] = mask.cumsum()
# Remove the rows where the count is 0
df = df.loc[~mask]
# Make the index a column so we can use 'index':['first', 'last'] to find the
# first and last date in each group.
df = df.reset_index()
result = df.groupby('groupno').agg({'index':['first', 'last'], 'count':'sum'})
result.columns = result.columns.droplevel(0)
result = result.rename(columns={'sum':'totalcount'})

屈服

         totalcount       first        last
groupno                                    
0                 4  2015-09-28  2015-09-29
1                 8  2015-10-01  2015-10-06
2                 1  2015-10-08  2015-10-08