Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/353.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何计算熊猫每天的报价价差?_Python_Pandas_Data Analysis - Fatal编程技术网

Python 如何计算熊猫每天的报价价差?

Python 如何计算熊猫每天的报价价差?,python,pandas,data-analysis,Python,Pandas,Data Analysis,我有一份债券市场数据如下: Id row Date BuyPrice SellPrice Time 1 1 2017-10-30 94520 0 9:00:00 1 2 2017-10-30 94538 0 9:00:00 1 3 2017-10-30 94609 0 9:00:00 1 4

我有一份债券市场数据如下:

Id   row      Date       BuyPrice    SellPrice    Time
1    1      2017-10-30    94520       0          9:00:00
1    2      2017-10-30    94538       0          9:00:00
1    3      2017-10-30    94609       0          9:00:00
1    4      2017-10-30    94615       0          9:00:00
1    5      2017-10-30    94617       0          9:00:00
1    1      2017-09-20    99100       99159      9:00:10
1    2      2017-09-20    99102       99058      9:00:11
1    3      2017-09-20    99103       99057      9:00:12
1    4      2017-09-20    99104       99056      9:00:10
1    5      2017-09-20    99105       99055      9:00:10
1    1      2017-09-20    98100       99190      9:01:10
1    2      2017-09-20    98099       99091      9:01:10
1    3      2017-09-20    98098       99092      9:01:10
1    4      2017-09-20    98097       99093      9:01:10
1    5      2017-09-20    98096       99094      9:01:10
2    1      2010-11-01    99890       100000     10:00:02
2    2      2010-11-01    99899       100000     10:00:02
2    3      2010-11-01    99901       99899      9:00:02
2    4      2010-11-01    99920       99850      10:00:02
2    5      2010-11-01    99933       99848      10:00:23
id     row      Date         BuyPrice      SellPrice     Spread
1      1        2017-10-30   94520         0             NaN
1      1        2017-09-20   99100         99159         59
1      1        2017-09-20   98100         99190         190
2      1        2010-11-01   99890         100000        110
Id    Date        avg.spread(average of spread for each day)   index
1     2017-10-30   NaN                                           1
1     2017-09-20   124.5(=(59+190)/2)                            2
2     2010-11-01   110                                           1
步骤1:

我想计算每天每个id第1行的价差(=SellPrice-BuyPrice),如果BuyPrice或SellPrice中有零,则排除零(对于此类数据,报告nan),此步骤中的数据应如下所示:

Id   row      Date       BuyPrice    SellPrice    Time
1    1      2017-10-30    94520       0          9:00:00
1    2      2017-10-30    94538       0          9:00:00
1    3      2017-10-30    94609       0          9:00:00
1    4      2017-10-30    94615       0          9:00:00
1    5      2017-10-30    94617       0          9:00:00
1    1      2017-09-20    99100       99159      9:00:10
1    2      2017-09-20    99102       99058      9:00:11
1    3      2017-09-20    99103       99057      9:00:12
1    4      2017-09-20    99104       99056      9:00:10
1    5      2017-09-20    99105       99055      9:00:10
1    1      2017-09-20    98100       99190      9:01:10
1    2      2017-09-20    98099       99091      9:01:10
1    3      2017-09-20    98098       99092      9:01:10
1    4      2017-09-20    98097       99093      9:01:10
1    5      2017-09-20    98096       99094      9:01:10
2    1      2010-11-01    99890       100000     10:00:02
2    2      2010-11-01    99899       100000     10:00:02
2    3      2010-11-01    99901       99899      9:00:02
2    4      2010-11-01    99920       99850      10:00:02
2    5      2010-11-01    99933       99848      10:00:23
id     row      Date         BuyPrice      SellPrice     Spread
1      1        2017-10-30   94520         0             NaN
1      1        2017-09-20   99100         99159         59
1      1        2017-09-20   98100         99190         190
2      1        2010-11-01   99890         100000        110
Id    Date        avg.spread(average of spread for each day)   index
1     2017-10-30   NaN                                           1
1     2017-09-20   124.5(=(59+190)/2)                            2
2     2010-11-01   110                                           1
步骤2:

现在我想计算每个id每天的平均价差,并给出关于日期的指数

最后,数据应该是这样的:

Id   row      Date       BuyPrice    SellPrice    Time
1    1      2017-10-30    94520       0          9:00:00
1    2      2017-10-30    94538       0          9:00:00
1    3      2017-10-30    94609       0          9:00:00
1    4      2017-10-30    94615       0          9:00:00
1    5      2017-10-30    94617       0          9:00:00
1    1      2017-09-20    99100       99159      9:00:10
1    2      2017-09-20    99102       99058      9:00:11
1    3      2017-09-20    99103       99057      9:00:12
1    4      2017-09-20    99104       99056      9:00:10
1    5      2017-09-20    99105       99055      9:00:10
1    1      2017-09-20    98100       99190      9:01:10
1    2      2017-09-20    98099       99091      9:01:10
1    3      2017-09-20    98098       99092      9:01:10
1    4      2017-09-20    98097       99093      9:01:10
1    5      2017-09-20    98096       99094      9:01:10
2    1      2010-11-01    99890       100000     10:00:02
2    2      2010-11-01    99899       100000     10:00:02
2    3      2010-11-01    99901       99899      9:00:02
2    4      2010-11-01    99920       99850      10:00:02
2    5      2010-11-01    99933       99848      10:00:23
id     row      Date         BuyPrice      SellPrice     Spread
1      1        2017-10-30   94520         0             NaN
1      1        2017-09-20   99100         99159         59
1      1        2017-09-20   98100         99190         190
2      1        2010-11-01   99890         100000        110
Id    Date        avg.spread(average of spread for each day)   index
1     2017-10-30   NaN                                           1
1     2017-09-20   124.5(=(59+190)/2)                            2
2     2010-11-01   110                                           1

我尽了最大努力去理解你想要什么,虽然你没有明确提到,但我认为你想在
Id
日期
上进行
groupby

g = df.assign(diff=df.SellPrice.sub(df.BuyPrice))\
                 .groupby(['Id', 'row', 'Date']).diff.mean()

v = g.groupby(level=[0, 1]).cumcount().add(1).values
df = g.reset_index().assign(index=v)

df

    Id  row        Date   diff  index
0    1    1  2017-09-20  574.5      1
1    1    1  2017-10-30    NaN      2
2    1    2  2017-09-20  474.0      1
3    1    2  2017-10-30    NaN      2
4    1    3  2017-09-20  474.0      1
5    1    3  2017-10-30    NaN      2
6    1    4  2017-09-20  474.0      1
7    1    4  2017-10-30    NaN      2
8    1    5  2017-09-20  474.0      1
9    1    5  2017-10-30    NaN      2
10   2    1  2010-11-01  110.0      1
11   2    2  2010-11-01  101.0      1
12   2    3  2010-11-01   -2.0      1
13   2    4  2010-11-01  -70.0      1
14   2    5  2010-11-01  -85.0      1

我尽了最大努力去理解你想要什么,虽然你没有明确提到,但我认为你想在
Id
日期
上进行
groupby

g = df.assign(diff=df.SellPrice.sub(df.BuyPrice))\
                 .groupby(['Id', 'row', 'Date']).diff.mean()

v = g.groupby(level=[0, 1]).cumcount().add(1).values
df = g.reset_index().assign(index=v)

df

    Id  row        Date   diff  index
0    1    1  2017-09-20  574.5      1
1    1    1  2017-10-30    NaN      2
2    1    2  2017-09-20  474.0      1
3    1    2  2017-10-30    NaN      2
4    1    3  2017-09-20  474.0      1
5    1    3  2017-10-30    NaN      2
6    1    4  2017-09-20  474.0      1
7    1    4  2017-10-30    NaN      2
8    1    5  2017-09-20  474.0      1
9    1    5  2017-10-30    NaN      2
10   2    1  2010-11-01  110.0      1
11   2    2  2010-11-01  101.0      1
12   2    3  2010-11-01   -2.0      1
13   2    4  2010-11-01  -70.0      1
14   2    5  2010-11-01  -85.0      1

你好像在丢弃复制品。您能否确认您的预期输出是否正确?此外,对于这些数据,我得到的平均值与您的不同。此外,还不清楚该指数是如何计算和打印的。平均值是这样计算的:(59+90)/2,(并排除平均值的NaN值,如果一天只有NaN值报告NaN)。在本例中,指数是即时的天数;2017-10-30是第一天,因此它得到索引1,2017-09-20是id 1的第二天,因此它得到索引2是,但99190-98100是1090.0,而不是90。在id=2的最后一行中,它是99890,而不是99899。看起来您正在删除重复项。您能否确认您的预期输出是否正确?此外,对于这些数据,我得到的平均值与您的不同。此外,还不清楚该指数是如何计算和打印的。平均值是这样计算的:(59+90)/2,(并排除平均值的NaN值,如果一天只有NaN值报告NaN)。在本例中,指数是即时的天数;2017-10-30是第一天,因此它得到索引1,2017-09-20是id 1的第二天,因此它得到索引2是,但99190-98100是1090.0,而不是90。在id=2的最后一行,它是99890,而不是99899。