Python 列值更改的计数时间
我有一个如下所示的数据帧:Python 列值更改的计数时间,python,pandas,dataframe,apply,Python,Pandas,Dataframe,Apply,我有一个如下所示的数据帧: df=DataFrame({'date':{379724:'2017-01-31', 379725: '2017-01-31', 414510: '2017-02-14', 414509: '2017-02-28', 414511: '2017-02-28', 507215: '2017-04-27', 507213: '2017-04-27', 507214: '2017-04-27', 507235: '2017-04-27', 562139: '2017-04-
df=DataFrame({'date':{379724:'2017-01-31',
379725: '2017-01-31',
414510: '2017-02-14',
414509: '2017-02-28',
414511: '2017-02-28',
507215: '2017-04-27',
507213: '2017-04-27',
507214: '2017-04-27',
507235: '2017-04-27',
562139: '2017-04-27',
672967: '2017-07-27',
672968: '2017-07-27',
672969: '2017-07-27',
910729: '2017-12-07',
990263: '2018-01-30',
990265: '2018-01-30',
990264: '2018-01-30',
121543: '2018-06-26',
255129: '2018-09-20'},
'id':{379724:'110000078451',
379725: '110000078451',
414510: '110000078451',
414509: '110000078451',
414511: '110000078451',
507215: '110000078451',
507213: '110000078451',
507214: '110000078451',
507235: '110000078451',
562139: '110000078451',
672967: '110000078451',
672968: '110000078451',
672969: '110000078451',
910729: '110000078451',
990263: '110000078451',
990265: '110000078451',
990264: '110000078451',
121543: '110000078451',
255129: '110000078451'},
'限制':{379724:0,
379725: 1,
414510: 1,
414509: 0,
414511: 0,
507215: 0,
507213: 0,
507214: 1,
507235: 0,
562139: 0,
672967: 0,
672968: 0,
672969: 0,
910729: 0,
990263: 0,
990265: 0,
990264: 0,
121543: 0,
255129: 0})
我需要计算'limit'
中的值在每组'id'
中变化的次数
我想到的代码是:
count01=df.groupby('id')['limit'].滚动(2,最小周期=1)
.apply(λx:((x[0]!=x[-1])&(x[0]==1)),raw=True)
.groupby('id').sum().astype(int).reset_索引(name='count01')
count10=df.groupby('id')['limit'].滚动(2,最小周期=1)
.apply(λx:((x[0]!=x[-1])&(x[0]==0)),raw=True)
.groupby('id').sum().astype(int).reset_索引(name='count10')
count\u total=count01.merge(count10,on='id')
有时它提供正确的结果,有时则不正确。我认为组中的第一个apply
值可能被指定为NaN,并且结果受此影响,但可能不是NaN
结果应该是:
id | count01 | count10
-------------------------------
110000078451| 2 | 2
谢谢
编辑:我更新了我的示例,使其更符合实际数据。在count01更改中:
(x[0] == 1)) --> (x[0] == 0))
在第10项变化中:
(x[0] == 0)) --> (x[0] == 1))
这应该行得通
import pandas as pd
def limit_change_counter(limits, _from, _to):
tmp = list(limits)
counter = 0
for idx, limit in enumerate(tmp):
if idx > 0:
if tmp[idx - 1] == _from and limit == _to:
counter += 1
return counter
df = pd.DataFrame.from_dict({'date': {379724: '2017-01-31',
379725: '2017-01-31',
414510: '2017-02-14',
414509: '2017-02-28',
414511: '2017-02-28',
507215: '2017-04-27',
507213: '2017-04-27',
507214: '2017-04-27',
507235: '2017-04-27',
562139: '2017-04-27',
672967: '2017-07-27',
672968: '2017-07-27',
672969: '2017-07-27',
910729: '2017-12-07',
990263: '2018-01-30',
990265: '2018-01-30',
990264: '2018-01-30',
121543: '2018-06-26',
255129: '2018-09-20'},
'id': {379724: '110000078451',
379725: '110000078451',
414510: '110000078451',
414509: '110000078451',
414511: '110000078451',
507215: '110000078451',
507213: '110000078451',
507214: '110000078451',
507235: '110000078451',
562139: '110000078451',
672967: '110000078451',
672968: '110000078451',
672969: '110000078451',
910729: '110000078451',
990263: '110000078451',
990265: '110000078451',
990264: '110000078451',
121543: '110000078451',
255129: '110000078451'},
'limit': {379724: 0,
379725: 1,
414510: 1,
414509: 0,
414511: 0,
507215: 0,
507213: 0,
507214: 1,
507235: 0,
562139: 0,
672967: 0,
672968: 0,
672969: 0,
910729: 0,
990263: 0,
990265: 0,
990264: 0,
121543: 0,
255129: 0}})
df.sort_values(by='date', inplace=True)
print(df)
df['limit_changes_0_to_1'] = df.groupby(['id'])['limit'].transform(limit_change_counter, 0, 1)
df['limit_changes_1_to_0'] = df.groupby(['id'])['limit'].transform(limit_change_counter, 1, 0)
df.drop_duplicates(subset="id", keep="first", inplace=True)
print(df)
您可以首先创建一个列,该列中的转换位于同一id内,然后使用pivot_表对这些转换进行计数:
df2 = df.shift()
df2['limit'] = df2['limit'].bfill().astype(int) # force limit to type int in shifted df
df.loc[(df.id==df2.id)&(df.limit!=df2.limit),'transition'] = \
df2.limit.astype(str)+df.limit.astype(str)
resul = df.pivot_table(index='id', columns='transition', aggfunc='count',values='date', fill_value=0)
给予:
transition 01 10
id
111 2 1
22 0 1
您可以改进演示文稿:
resul = resul.rename(columns=lambda x: 'count'+x).rename_axis('', axis=1).reset_index()
要最终获得:
id count01 count10
0 111 2 1
1 22 0 1
目标是统计每个id组中每个唯一限制值(0和1)的限制变化。所以有一个更改0->1和一个更改1->0必须分别计算。@gribna得到了它。代码被更改。我复制了你的代码,但结果中总是给我零。@gribna让我做一次双重检查。我将从stackoverflow复制代码,并让你知道输出是什么。@gribna这是代码python代码,它生成了这个o输出。似乎是这样work@gribna从我的测试复制时忘记了第一行…已编辑。我在真实数据上尝试了它,但它提供了错误的结果。例如,此序列(0 1 1 0 0 0 1 0)对于一个id,count01=1和count10=1@gribna:无法重现此问题。我两个都得到了2。你能分享一些显示错误的数据吗?我用真实数据更新了示例。当我运行代码时,它给我id count0.01 count1.00 110000078451 33@gribna:我可以复制和修复。我只是忘了确保
限制
是移位数据帧中的int。已编辑。根据您建议的更改,它将分别计算从1到0和从0到1的转换。但是原始代码中仍然存在一些其他错误,因为结果并不总是正确的。