kdb-基于数据点截断后续行_Kdb

kdb-基于数据点截断后续行

kdb

kdb-基于数据点截断后续行,kdb,Kdb,全部,，我认为搜索一个表、识别一个点，然后截断或删除表中一组数据的后续行是一项相当简单的任务，但我很难解决这个问题。我相信我需要一个嵌套的函数在我的更新查询，但我还没有成功地编写一个。我还尝试创建一个delete_me列，它允许我识别并运行一个delete，这对于审计代码来说可能更快更好理想情况下，我希望将其封装在一个可调用函数中，因为有几种不同的截断方法在下面的示例中，我确定了最大累积值日期，然后按id标记后续日期行，以便最终删除 ///raw data for copy and pas

全部,，我认为搜索一个表、识别一个点，然后截断或删除表中一组数据的后续行是一项相当简单的任务，但我很难解决这个问题。我相信我需要一个嵌套的函数在我的更新查询，但我还没有成功地编写一个。我还尝试创建一个delete_me列，它允许我识别并运行一个delete，这对于审计代码来说可能更快更好

理想情况下，我希望将其封装在一个可调用函数中，因为有几种不同的截断方法

在下面的示例中，我确定了最大累积值日期，然后按id标记后续日期行，以便最终删除

///raw data for copy and paste - `:./Data/sample.csv;
id,idate,a,b,c
AAA,1/31/2014,1000,500,500
AAA,2/28/2014,900,500,50
AAA,3/31/2014,850,500,0
AAA,4/30/2014,800,500,0
AAA,5/31/2014,750,500,0
AAA,6/30/2014,700,500,0
AAA,7/31/2014,650,500,0
AAA,8/31/2014,550,500,0
AAA,9/30/2014,500,500,0
AAA,10/31/2014,450,500,0
BBB,6/30/2012,1000,500,2500
BBB,7/31/2012,950,500,75
BBB,8/31/2012,900,500,0
BBB,9/30/2012,850,500,0
BBB,10/31/2012,800,500,0
BBB,11/30/2012,750,500,0
BBB,12/31/2012,700,500,0
BBB,1/31/2013,650,500,0
BBB,2/28/2013,600,500,0
BBB,3/31/2013,550,500,0
BBB,4/30/2013,500,500,0
BBB,5/31/2013,450,500,0
BBB,6/30/2013,400,500,0
CCC,1/1/2016,1000,500,1200
CCC,2/29/2016,950,500,30
CCC,3/31/2016,900,500,0
CCC,4/30/2016,850,500,0
CCC,5/31/2016,800,500,0
CCC,6/30/2016,750,500,0
CCC,7/31/2016,700,500,0
CCC,8/31/2016,650,500,0
CCC,9/30/2016,600,500,0
CCC,10/31/2016,550,500,0
CCC,11/30/2016,500,500,0
CCC,12/31/2016,450,500,0
CCC,1/31/2017,400,500,0
CCC,2/28/2017,350,500,0
CCC,3/31/2017,300,500,0
CCC,4/30/2017,250,500,0

加载数据并添加一些计算

\c 100 150i
t:("SSFFF";enlist",") 0:`:./Data/sample.csv;
t: update kdbDate: "D"$string idate, d:(a-(b+c)),cum_d: sums (a-(b+c)) from t;
t:![t; (); (enlist`id)!enlist`id; (enlist`maxCum_d)!enlist(max;`cum_d)];
t:![t; enlist(=;`maxCum_d;`cum_d); (enlist`id)!enlist`id; (enlist `date_cutoff)!enlist(*:;`kdbDate)];

下面是我目前被困的地方。我还考虑过使用fills来填充每个id的其余行的date_截止值，并避免创建另一列

show exec max(date_cutoff) by id from t;
assignDelete:{[t] update del: `delete_me by id from t where max (date_cutoff) > kdbDate}; //<--STUCK--
t: assignDelete over t;
t:![t; enlist (~:;(^:;`del)); 0b; `symbol$()] ; //delete from t where not null `del

[编辑]在另一列上使用填充似乎可以正常工作。注意最大值之后的截断


t: update del:fills date_cutoff by id from t where kdbDate>date_cutoff;
or in functional form
t: ![t; enlist(>;`kdbDate;`date_cutoff);(enlist`id)!enlist`id;(enlist`del)! enlist (^\;`date_cutoff)];


id  idate      a    b   c    kdbDate    d     cum_d maxCum_d date_cutoff del
----------------------------------------------------------------------------
AAA 1/31/2014  1000 500 500  2014.01.31 0     0     1650
AAA 2/28/2014  900  500 50   2014.02.28 350   350   1650
AAA 3/31/2014  850  500 0    2014.03.31 350   700   1650
AAA 4/30/2014  800  500 0    2014.04.30 300   1000  1650
AAA 5/31/2014  750  500 0    2014.05.31 250   1250  1650
AAA 6/30/2014  700  500 0    2014.06.30 200   1450  1650
AAA 7/31/2014  650  500 0    2014.07.31 150   1600  1650
AAA 8/31/2014  550  500 0    2014.08.31 50    1650  1650     2014.08.31
BBB 6/30/2012  1000 500 2500 2012.06.30 -2000 -400  1775
BBB 7/31/2012  950  500 75   2012.07.31 375   -25   1775
BBB 8/31/2012  900  500 0    2012.08.31 400   375   1775
BBB 9/30/2012  850  500 0    2012.09.30 350   725   1775
BBB 10/31/2012 800  500 0    2012.10.31 300   1025  1775
BBB 11/30/2012 750  500 0    2012.11.30 250   1275  1775
BBB 12/31/2012 700  500 0    2012.12.31 200   1475  1775
BBB 1/31/2013  650  500 0    2013.01.31 150   1625  1775
BBB 2/28/2013  600  500 0    2013.02.28 100   1725  1775
BBB 3/31/2013  550  500 0    2013.03.31 50    1775  1775     2013.03.31
CCC 1/1/2016   1000 500 1200 2016.01.01 -700  925   3145
CCC 2/29/2016  950  500 30   2016.02.29 420   1345  3145
CCC 3/31/2016  900  500 0    2016.03.31 400   1745  3145
CCC 4/30/2016  850  500 0    2016.04.30 350   2095  3145
CCC 5/31/2016  800  500 0    2016.05.31 300   2395  3145
CCC 6/30/2016  750  500 0    2016.06.30 250   2645  3145
CCC 7/31/2016  700  500 0    2016.07.31 200   2845  3145
CCC 8/31/2016  650  500 0    2016.08.31 150   2995  3145
CCC 9/30/2016  600  500 0    2016.09.30 100   3095  3145
CCC 10/31/2016 550  500 0    2016.10.31 50    3145  3145     2016.10.31

对于这个解决方案，我将按id连接的date\u cutoff保留到表中，以便所有date\u cutoff条目都不为null，然后使用一个向量条件来确定是否删除

q)t:t lj select last date_cutoff by id from t where not null date_cutoff
q)update del:?[date_cutoff<kdbDate;`delete_me;`]from t

只要在一个id分组中只有一个不同的日期截止，这就应该有效。

对于这个解决方案，我将按id连接的日期截止保留到表中，以便所有日期截止项都不为空，然后使用向量条件来确定是否删除

q)t:t lj select last date_cutoff by id from t where not null date_cutoff
q)update del:?[date_cutoff<kdbDate;`delete_me;`]from t

只要在一个id分组中只有一个不同的日期截止点，这就应该起作用。

maxs函数计算给定向量的运行最大值。通过使用fby where子句，可以避免添加这些辅助列：

// define the table
q)t:("SSFFF";enlist",") 0:`:./Data/sample.csv;
q)t: update kdbDate: "D"$string idate, d:(a-(b+c)),cum_d: sums (a-(b+c)) from t;

// delete rows with one q-sql statement
q)delete from t where ({prev max[x]=maxs[x]};cum_d) fby id

maxs函数计算给定向量的运行最大值。通过使用fby where子句，可以避免添加这些辅助列：

// define the table
q)t:("SSFFF";enlist",") 0:`:./Data/sample.csv;
q)t: update kdbDate: "D"$string idate, d:(a-(b+c)),cum_d: sums (a-(b+c)) from t;

// delete rows with one q-sql statement
q)delete from t where ({prev max[x]=maxs[x]};cum_d) fby id

谢谢Cathal。我刚刚保存了一个使用fills命令的编辑，但它不是那么平滑。谢谢你的帮助！谢谢Cathal。我刚刚保存了一个使用fills命令的编辑，但它不是那么平滑。谢谢你的帮助！谢谢Jorge！这当然也更加精简。我一直在阅读并试图找到使用fby的地方，但实际上还没有成功地实现它。我的表太宽了，我试图减少列数，所以这非常有用。现在我有了所有的基础数据集，我希望使用它，因为我打算按id聚合指标。例如，我正在按id IRR、盈亏平衡价格、ROE等运行各种指标和聚合函数，因为每个id代表一个比较的决策。然后，我将查看不同id分组的相同指标。再次感谢！谢谢Jorge！这当然也更加精简。我一直在阅读并试图找到使用fby的地方，但实际上还没有成功地实现它。我的表太宽了，我试图减少列数，所以这非常有用。现在我有了所有的基础数据集，我希望使用它，因为我打算按id聚合指标。例如，我正在按id IRR、盈亏平衡价格、ROE等运行各种指标和聚合函数，因为每个id代表一个比较的决策。然后，我将查看不同id分组的相同指标。再次感谢！