Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/285.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 列值更改的计数时间_Python_Pandas_Dataframe_Apply - Fatal编程技术网

Python 列值更改的计数时间

Python 列值更改的计数时间,python,pandas,dataframe,apply,Python,Pandas,Dataframe,Apply,我有一个如下所示的数据帧: df=DataFrame({'date':{379724:'2017-01-31', 379725: '2017-01-31', 414510: '2017-02-14', 414509: '2017-02-28', 414511: '2017-02-28', 507215: '2017-04-27', 507213: '2017-04-27', 507214: '2017-04-27', 507235: '2017-04-27', 562139: '2017-04-

我有一个如下所示的数据帧:

df=DataFrame({'date':{379724:'2017-01-31',
379725: '2017-01-31',
414510: '2017-02-14',
414509: '2017-02-28',
414511: '2017-02-28',
507215: '2017-04-27',
507213: '2017-04-27',
507214: '2017-04-27',
507235: '2017-04-27',
562139: '2017-04-27',
672967: '2017-07-27',
672968: '2017-07-27',
672969: '2017-07-27',
910729: '2017-12-07',
990263: '2018-01-30',
990265: '2018-01-30',
990264: '2018-01-30',
121543: '2018-06-26',
255129: '2018-09-20'},
'id':{379724:'110000078451',
379725: '110000078451',
414510: '110000078451',
414509: '110000078451',
414511: '110000078451',
507215: '110000078451',
507213: '110000078451',
507214: '110000078451',
507235: '110000078451',
562139: '110000078451',
672967: '110000078451',
672968: '110000078451',
672969: '110000078451',
910729: '110000078451',
990263: '110000078451',
990265: '110000078451',
990264: '110000078451',
121543: '110000078451',
255129: '110000078451'},
'限制':{379724:0,
379725: 1,
414510: 1,
414509: 0,
414511: 0,
507215: 0,
507213: 0,
507214: 1,
507235: 0,
562139: 0,
672967: 0,
672968: 0,
672969: 0,
910729: 0,
990263: 0,
990265: 0,
990264: 0,
121543: 0,
255129: 0})
我需要计算
'limit'
中的值在每组
'id'
中变化的次数

我想到的代码是:

count01=df.groupby('id')['limit'].滚动(2,最小周期=1)
.apply(λx:((x[0]!=x[-1])&(x[0]==1)),raw=True)
.groupby('id').sum().astype(int).reset_索引(name='count01')
count10=df.groupby('id')['limit'].滚动(2,最小周期=1)
.apply(λx:((x[0]!=x[-1])&(x[0]==0)),raw=True)
.groupby('id').sum().astype(int).reset_索引(name='count10')
count\u total=count01.merge(count10,on='id')
有时它提供正确的结果,有时则不正确。我认为组中的第一个
apply
值可能被指定为NaN,并且结果受此影响,但可能不是NaN

结果应该是:

id | count01 | count10
-------------------------------
110000078451| 2       | 2
谢谢

编辑:我更新了我的示例,使其更符合实际数据。

在count01更改中:

(x[0] == 1)) --> (x[0] == 0))
在第10项变化中:

(x[0] == 0)) --> (x[0] == 1))
这应该行得通

import pandas as pd


def limit_change_counter(limits, _from, _to):
    tmp = list(limits)
    counter = 0
    for idx, limit in enumerate(tmp):
        if idx > 0:
            if tmp[idx - 1] == _from and limit == _to:
                counter += 1
    return counter


df = pd.DataFrame.from_dict({'date': {379724: '2017-01-31',
                                      379725: '2017-01-31',
                                      414510: '2017-02-14',
                                      414509: '2017-02-28',
                                      414511: '2017-02-28',
                                      507215: '2017-04-27',
                                      507213: '2017-04-27',
                                      507214: '2017-04-27',
                                      507235: '2017-04-27',
                                      562139: '2017-04-27',
                                      672967: '2017-07-27',
                                      672968: '2017-07-27',
                                      672969: '2017-07-27',
                                      910729: '2017-12-07',
                                      990263: '2018-01-30',
                                      990265: '2018-01-30',
                                      990264: '2018-01-30',
                                      121543: '2018-06-26',
                                      255129: '2018-09-20'},
                             'id': {379724: '110000078451',
                                    379725: '110000078451',
                                    414510: '110000078451',
                                    414509: '110000078451',
                                    414511: '110000078451',
                                    507215: '110000078451',
                                    507213: '110000078451',
                                    507214: '110000078451',
                                    507235: '110000078451',
                                    562139: '110000078451',
                                    672967: '110000078451',
                                    672968: '110000078451',
                                    672969: '110000078451',
                                    910729: '110000078451',
                                    990263: '110000078451',
                                    990265: '110000078451',
                                    990264: '110000078451',
                                    121543: '110000078451',
                                    255129: '110000078451'},
                             'limit': {379724: 0,
                                       379725: 1,
                                       414510: 1,
                                       414509: 0,
                                       414511: 0,
                                       507215: 0,
                                       507213: 0,
                                       507214: 1,
                                       507235: 0,
                                       562139: 0,
                                       672967: 0,
                                       672968: 0,
                                       672969: 0,
                                       910729: 0,
                                       990263: 0,
                                       990265: 0,
                                       990264: 0,
                                       121543: 0,
                                       255129: 0}})

df.sort_values(by='date', inplace=True)
print(df)

df['limit_changes_0_to_1'] = df.groupby(['id'])['limit'].transform(limit_change_counter, 0, 1)
df['limit_changes_1_to_0'] = df.groupby(['id'])['limit'].transform(limit_change_counter, 1, 0)
df.drop_duplicates(subset="id", keep="first", inplace=True)

print(df)

您可以首先创建一个列,该列中的转换位于同一id内,然后使用pivot_表对这些转换进行计数:

df2 = df.shift()
df2['limit'] = df2['limit'].bfill().astype(int)  # force limit to type int in shifted df
df.loc[(df.id==df2.id)&(df.limit!=df2.limit),'transition'] = \
                                   df2.limit.astype(str)+df.limit.astype(str)

resul = df.pivot_table(index='id', columns='transition', aggfunc='count',values='date', fill_value=0)
给予:

transition  01  10
id                
111          2   1
22           0   1
您可以改进演示文稿:

resul = resul.rename(columns=lambda x: 'count'+x).rename_axis('', axis=1).reset_index()
要最终获得:

    id  count01  count10
0  111        2        1
1   22        0        1

目标是统计每个id组中每个唯一限制值(0和1)的限制变化。所以有一个更改0->1和一个更改1->0必须分别计算。@gribna得到了它。代码被更改。我复制了你的代码,但结果中总是给我零。@gribna让我做一次双重检查。我将从stackoverflow复制代码,并让你知道输出是什么。@gribna这是代码python代码,它生成了这个o输出。似乎是这样work@gribna从我的测试复制时忘记了第一行…已编辑。我在真实数据上尝试了它,但它提供了错误的结果。例如,此序列(0 1 1 0 0 0 1 0)对于一个id,count01=1和count10=1@gribna:无法重现此问题。我两个都得到了2。你能分享一些显示错误的数据吗?我用真实数据更新了示例。当我运行代码时,它给我id count0.01 count1.00 110000078451 33@gribna:我可以复制和修复。我只是忘了确保
限制
是移位数据帧中的int。已编辑。根据您建议的更改,它将分别计算从1到0和从0到1的转换。但是原始代码中仍然存在一些其他错误,因为结果并不总是正确的。