Python 列值更改的计数时间_Python_Pandas_Dataframe_Apply

Python 列值更改的计数时间

python pandas dataframe

Python 列值更改的计数时间,python,pandas,dataframe,apply,Python,Pandas,Dataframe,Apply,我有一个如下所示的数据帧： df=DataFrame（{'date'：{379724:'2017-01-31'， 379725: '2017-01-31', 414510: '2017-02-14', 414509: '2017-02-28', 414511: '2017-02-28', 507215: '2017-04-27', 507213: '2017-04-27', 507214: '2017-04-27', 507235: '2017-04-27', 562139: '2017-04-

我有一个如下所示的数据帧：

df=DataFrame（{'date'：{379724:'2017-01-31'，
379725: '2017-01-31',
414510: '2017-02-14',
414509: '2017-02-28',
414511: '2017-02-28',
507215: '2017-04-27',
507213: '2017-04-27',
507214: '2017-04-27',
507235: '2017-04-27',
562139: '2017-04-27',
672967: '2017-07-27',
672968: '2017-07-27',
672969: '2017-07-27',
910729: '2017-12-07',
990263: '2018-01-30',
990265: '2018-01-30',
990264: '2018-01-30',
121543: '2018-06-26',
255129: '2018-09-20'},
'id'：{379724:'110000078451'，
379725: '110000078451',
414510: '110000078451',
414509: '110000078451',
414511: '110000078451',
507215: '110000078451',
507213: '110000078451',
507214: '110000078451',
507235: '110000078451',
562139: '110000078451',
672967: '110000078451',
672968: '110000078451',
672969: '110000078451',
910729: '110000078451',
990263: '110000078451',
990265: '110000078451',
990264: '110000078451',
121543: '110000078451',
255129: '110000078451'},
'限制'：{379724:0，
379725: 1,
414510: 1,
414509: 0,
414511: 0,
507215: 0,
507213: 0,
507214: 1,
507235: 0,
562139: 0,
672967: 0,
672968: 0,
672969: 0,
910729: 0,
990263: 0,
990265: 0,
990264: 0,
121543: 0,
255129: 0})

我需要计算

'limit'

中的值在每组

'id'

中变化的次数

我想到的代码是：

count01=df.groupby（'id'）['limit'].滚动（2，最小周期=1）
.apply（λx:（（x[0]！=x[-1]）&（x[0]==1）），raw=True）
.groupby（'id'）.sum（）.astype（int）.reset_索引（name='count01'）
count10=df.groupby（'id'）['limit'].滚动（2，最小周期=1）
.apply（λx:（（x[0]！=x[-1]）&（x[0]==0）），raw=True）
.groupby（'id'）.sum（）.astype（int）.reset_索引（name='count10'）
count\u total=count01.merge（count10，on='id'）

有时它提供正确的结果，有时则不正确。我认为组中的第一个

apply

值可能被指定为NaN，并且结果受此影响，但可能不是NaN

结果应该是：

id | count01 | count10
-------------------------------
110000078451| 2       | 2

谢谢

编辑：我更新了我的示例，使其更符合实际数据。

在count01更改中：

(x[0] == 1)) --> (x[0] == 0))

在第10项变化中：

(x[0] == 0)) --> (x[0] == 1))

这应该行得通

import pandas as pd


def limit_change_counter(limits, _from, _to):
    tmp = list(limits)
    counter = 0
    for idx, limit in enumerate(tmp):
        if idx > 0:
            if tmp[idx - 1] == _from and limit == _to:
                counter += 1
    return counter


df = pd.DataFrame.from_dict({'date': {379724: '2017-01-31',
                                      379725: '2017-01-31',
                                      414510: '2017-02-14',
                                      414509: '2017-02-28',
                                      414511: '2017-02-28',
                                      507215: '2017-04-27',
                                      507213: '2017-04-27',
                                      507214: '2017-04-27',
                                      507235: '2017-04-27',
                                      562139: '2017-04-27',
                                      672967: '2017-07-27',
                                      672968: '2017-07-27',
                                      672969: '2017-07-27',
                                      910729: '2017-12-07',
                                      990263: '2018-01-30',
                                      990265: '2018-01-30',
                                      990264: '2018-01-30',
                                      121543: '2018-06-26',
                                      255129: '2018-09-20'},
                             'id': {379724: '110000078451',
                                    379725: '110000078451',
                                    414510: '110000078451',
                                    414509: '110000078451',
                                    414511: '110000078451',
                                    507215: '110000078451',
                                    507213: '110000078451',
                                    507214: '110000078451',
                                    507235: '110000078451',
                                    562139: '110000078451',
                                    672967: '110000078451',
                                    672968: '110000078451',
                                    672969: '110000078451',
                                    910729: '110000078451',
                                    990263: '110000078451',
                                    990265: '110000078451',
                                    990264: '110000078451',
                                    121543: '110000078451',
                                    255129: '110000078451'},
                             'limit': {379724: 0,
                                       379725: 1,
                                       414510: 1,
                                       414509: 0,
                                       414511: 0,
                                       507215: 0,
                                       507213: 0,
                                       507214: 1,
                                       507235: 0,
                                       562139: 0,
                                       672967: 0,
                                       672968: 0,
                                       672969: 0,
                                       910729: 0,
                                       990263: 0,
                                       990265: 0,
                                       990264: 0,
                                       121543: 0,
                                       255129: 0}})

df.sort_values(by='date', inplace=True)
print(df)

df['limit_changes_0_to_1'] = df.groupby(['id'])['limit'].transform(limit_change_counter, 0, 1)
df['limit_changes_1_to_0'] = df.groupby(['id'])['limit'].transform(limit_change_counter, 1, 0)
df.drop_duplicates(subset="id", keep="first", inplace=True)

print(df)

您可以首先创建一个列，该列中的转换位于同一id内，然后使用pivot_表对这些转换进行计数：

df2 = df.shift()
df2['limit'] = df2['limit'].bfill().astype(int)  # force limit to type int in shifted df
df.loc[(df.id==df2.id)&(df.limit!=df2.limit),'transition'] = \
                                   df2.limit.astype(str)+df.limit.astype(str)

resul = df.pivot_table(index='id', columns='transition', aggfunc='count',values='date', fill_value=0)

给予：

transition  01  10
id                
111          2   1
22           0   1

您可以改进演示文稿：

resul = resul.rename(columns=lambda x: 'count'+x).rename_axis('', axis=1).reset_index()

要最终获得：

    id  count01  count10
0  111        2        1
1   22        0        1

目标是统计每个id组中每个唯一限制值（0和1）的限制变化。所以有一个更改0->1和一个更改1->0必须分别计算。@gribna得到了它。代码被更改。我复制了你的代码，但结果中总是给我零。@gribna让我做一次双重检查。我将从stackoverflow复制代码，并让你知道输出是什么。@gribna这是代码python代码，它生成了这个o输出。似乎是这样work@gribna从我的测试复制时忘记了第一行…已编辑。我在真实数据上尝试了它，但它提供了错误的结果。例如，此序列（0 1 1 0 0 0 1 0）对于一个id，count01=1和count10=1@gribna：无法重现此问题。我两个都得到了2。你能分享一些显示错误的数据吗？我用真实数据更新了示例。当我运行代码时，它给我id count0.01 count1.00 110000078451 33@gribna：我可以复制和修复。我只是忘了确保

限制

是移位数据帧中的int。已编辑。根据您建议的更改，它将分别计算从1到0和从0到1的转换。但是原始代码中仍然存在一些其他错误，因为结果并不总是正确的。