Pandas 计算等待时间

Pandas 计算等待时间,pandas,time-series,Pandas,Time Series,我是python时间序列编程新手。 考虑一个包含买卖股票的订单和相应的状态的文件。 订单文件包含多行,每行包含订单的状态 Following is sample content of the order file: {"DATETIME":"20171116 03:46:16.142514", "DATA": {"MODE":"ORD","INSTR":"INSTR1","TYPE":"New","id":1}} {"DATETIME":"20171116 03:46:16.243121", "

我是python时间序列编程新手。 考虑一个包含买卖股票的订单和相应的状态的文件。 订单文件包含多行,每行包含订单的状态

Following is sample content of the order file:
{"DATETIME":"20171116 03:46:16.142514", "DATA":
{"MODE":"ORD","INSTR":"INSTR1","TYPE":"New","id":1}}
{"DATETIME":"20171116 03:46:16.243121", "DATA":
{"MODE":"ORD","INSTR":"INSTR2","TYPE":"New","id":2}}
{"DATETIME":"20171116 03:46:16.758292", "DATA":
{"MODE":"ORD","INSTR":"INSTR3","TYPE":"New","id":3}}
{"DATETIME":"20171116 03:46:17.212341", "DATA":
{"MODE":"ORD","INSTR":"INSTR2","TYPE":"TRD","id":2}}
{"DATETIME":"20171116 03:46:17.467893", "DATA":
{"MODE":"ORD","INSTR":"INSTR1","TYPE":"CXL","id":1}}
{"DATETIME":"20171116 03:46:18.924825", "DATA":
{"MODE":"ORD","INSTR":"INSTR3","TYPE":"TRD","id":3}}
一行中每个字段的详细信息如下所示 ● 日期时间 ○ 订单的时间戳

○ Format
■ YYYYMMDD hh:mm:ss.mi

● MODE
○ Type of the message
○ Always will be ORD
● INSTR
○ Name of the instrument
● TYPE
○ Type of the order
○ Following are the possible values
■ NEW
● Opens a new order
● Order will be active as long as it is in NEW state
■ CXL
● Order got cancelled. Order will be in a closed state after CXL
■ TRD
● Order got traded. Order will be in a closed state after TRD
● ID
○ Unique Id for identifying a particular order
○ Use ID to find state of the same order

We define holding time as the time, in microseconds, an order is active. Order is active as long as it is in NEW state.

Given an order file calculate the following distribution of holding period per ticker.
● Mean
● Median
● Max
● 75th percentile
● 90the percentile
● 99the percentile
● Standard deviation

有人能帮我吗?…非常感谢。

使用“按功能移位”将新状态和当前状态的日期时间放在同一行中

import pandas as pd

data = \
[{"DATETIME":"20171116 03:46:16.142514", 
"MODE":"ORD","INSTR":"INSTR1","TYPE":"New","id":1},
{"DATETIME":"20171116 03:46:16.243121"
,"MODE":"ORD","INSTR":"INSTR2","TYPE":"New","id":2},
{"DATETIME":"20171116 03:46:16.758292"
,"MODE":"ORD","INSTR":"INSTR3","TYPE":"New","id":3},
{"DATETIME":"20171116 03:46:17.212341"
,"MODE":"ORD","INSTR":"INSTR2","TYPE":"TRD","id":2},
{"DATETIME":"20171116 03:46:17.467893"
,"MODE":"ORD","INSTR":"INSTR1","TYPE":"CXL","id":1},
{"DATETIME":"20171116 03:46:18.924825"
,"MODE":"ORD","INSTR":"INSTR3","TYPE":"TRD","id":3}]

df = pd.DataFrame(data)

df.sort_values(by=['id','DATETIME'],inplace=True)

df['DATETIME'] = pd.to_datetime(df['DATETIME'])

# I am assuming that id 1's next state cannot be new again
df['DATETIME_shiftby_1'] = df['DATETIME'].shift(1)

df['hold_out_time'] = df['DATETIME'] - df['DATETIME_shiftby_1']

def fun(x):
    if(x.shape[0]>1):
#         returning the second term as shift by increses the index vale by 1. 
# So second row will contain datetime of new state as DATETIME_shiftby_1 and current datetime as DATETIME
        return x.iloc[1,6]
    else: return 'still active'

#This dataframe will contain the holdout time for every id

df.groupby(['id']).agg(fun)

你能用代码把你目前得到的东西贴出来吗?你到底在纠结什么?谢谢你的回答,阿披谢克。但是我得到了以下错误:df2.groupby('id').agg(fun)文件“C:\Users\owner\Anaconda2\lib\site packages\pandas\core\index.py”,第188行,在“has\u valid\u tuple raise IndexingError”('toom indexer'))IndexingError:索引器太多请复制并粘贴上面的代码,看看是否抛出任何错误。如果没有,则将代码复制并粘贴到另一个问题以及完整的错误日志中。用这样短的错误消息调试代码是非常困难的。