kdb和x2B中的数据透视表/Q_Kdb

kdb和x2B中的数据透视表/Q

kdb

kdb和x2B中的数据透视表/Q,kdb,Kdb,我试图在KDB/q中透视一些交易数据。虽然我的数据与网站上的工作示例略有不同（请参见通用透视函数：），即使尝试了几个小时，我也无法让函数工作（我对KDB非常陌生）简单地说，我想从这张桌子开始： q)5# trades_agg date sym time exchange buysell| shares --------------------------------------| ------ 2009.01.05 aaca 09:30 BATS B | 4

我试图在KDB/q中透视一些交易数据。虽然我的数据与网站上的工作示例略有不同（请参见通用透视函数：），即使尝试了几个小时，我也无法让函数工作（我对KDB非常陌生）

简单地说，我想从这张桌子开始：

q)5# trades_agg
date       sym  time  exchange buysell| shares
--------------------------------------| ------
2009.01.05 aaca 09:30 BATS     B      | 484
2009.01.05 aaca 09:30 BATS     S      | 434
2009.01.05 aaca 09:30 NASDAQ   B      | 235
2009.01.05 aaca 09:30 NASDAQ   S      | 429
2009.01.05 aaca 09:30 NYSE     B      | 309

对于这一点：

date       sym  time  | BATSsharesB BATSsharesS NASDAQsharesB    ... 
----------------------| -----------------------------------------------
2009.01.05 aaca 09:30 | 484          434        235              ...
...                   | ...

我将提供一个工作示例来说明：

// Create data
qpd:5*2*4*"i"$16:00-09:30
date:raze(100*qpd)#'2009.01.05+til 5
sym:(raze/)5#enlist qpd#'100?`4
sym:(neg count sym)?sym
time:"t"$raze 500#enlist 09:30:00+15*til qpd
time+:(count time)?1000
exchange:raze 500#enlist raze(qpd div 3)#enlist`NYSE`NASDAQ`BATS
buysell:raze 500#enlist raze(qpd div 2)#enlist`B`S
shares:(500*qpd)?100
trades:([]date;sym;time;exchange;buysell;shares)
//I then aggregate the data into equal sized buckets
trades_agg: select sum shares by date, sym, time: 15 xbar time.minute, exchange, buysell from trades

// pivot function from the code.kx.com website
piv:{[t;k;p;v;f;g]
 v:(),v;
 G:group flip k!(t:.Q.v t)k;
 F:group flip p!t p;
 count[k]!g[k;P;C]xcols 0!key[G]!flip(C:f[v]P:flip value flip key F)!raze
  {[i;j;k;x;y]
   a:count[x]#x 0N;
   a[y]:x y;
   b:count[x]#0b;
   b[y]:1b;
   c:a i;
   c[k]:first'[a[j]@'where'[b j]];
   c}[I[;0];I J;J:where 1<>count'[I:value G]]/:\:[t v;value F]}

即使我使用建议的f和g函数，它也不起作用：

 f:{[v;P]`$raze each string raze P[;0],'/:v,/:\:P[;1]}
 g:{[k;P;c]k,(raze/)flip flip each 5 cut'10 cut raze reverse 10 cut asc c}

我不明白为什么它不能正常工作，因为它与网站上的示例非常接近。

您的表的键设置非常不正确，请注意：

trades_agg:0!select sum shares by date, sym, time: 15 xbar time.minute,exchange,buysell from trades

并将您的g定义为：

g:{[k;P;c]k,c}

了解f/g需要是什么的最佳方法是使用断点定义它，然后研究变量

g:{[k;P;c]break}

这是一个更易于使用的自包含版本：

tt:1000#0!trades_agg

piv:{[t;k;p;v]
    / controls new columns names
    f:{[v;P]`${raze " " sv x} each string raze P[;0],'/:v,/:\:P[;1]};
     v:(),v; k:(),k; p:(),p; / make sure args are lists
     G:group flip k!(t:.Q.v t)k;
     F:group flip p!t p;
     key[G]!flip(C:f[v]P:flip value flip key F)!raze
      {[i;j;k;x;y]
       a:count[x]#x 0N;
       a[y]:x y;
       b:count[x]#0b;
       b[y]:1b;
       c:a i;
       c[k]:first'[a[j]@'where'[b j]];
       c}[I[;0];I J;J:where 1<>count'[I:value G]]/:\:[t v;value F]};



q)piv[`tt;`date`sym`time;`exchange`buysell;enlist `shares]
date       sym  time | BATS shares B BATS shares S NASDAQ shares B NASDAQ sha..
---------------------| ------------------------------------------------------..
2009.01.05 adkk 09:30| 577           359           499             452       ..
2009.01.05 adkk 09:45| 882           501           339             467       ..
2009.01.05 adkk 10:00| 620           513           411             128       ..
2009.01.05 adkk 10:15| 501           544           272             544       ..
2009.01.05 adkk 10:30| 291           594           363             331       ..
2009.01.05 adkk 10:45| 867           500           498             536       ..
2009.01.05 adkk 11:00| 624           632           694             493       ..
2009.01.05 adkk 11:15| 99            704           600             299       ..
2009.01.05 adkk 11:30| 269           394           280             392       ..
2009.01.05 adkk 11:45| 635           744           758             597       ..
2009.01.05 adkk 12:00| 562           354           498             405       ..
2009.01.05 adkk 12:15| 416           437           303             492       ..
2009.01.05 adkk 12:30| 447           699           370             302       ..
2009.01.05 adkk 12:45| 336           647           512             245       ..
2009.01.05 adkk 13:00| 692           457           497             553       ..

tt:1000#0！贸易集团
piv:{[t；k；p；v]
/控制新列名称
f:{[v；P]`${raze”“sv x}每个字符串将P[；0]，'/：v，/：\：P[；1]}；
v:（），v；k:（），k；p:（），p；/n确保参数是列表
G：群翻转k！（t:.Q.vt）k；
F：组翻转p！tp；
键[G]！翻转（C:f[v]P:翻转值翻转键f）！剃平
{[i；j；k；x；y]
a：计数[x]#x0n；
a[y]：xy；
b：计数[x]#0b；
b[y]：1b；
c:a i；
c[k]：第一个'[a[j]@'其中'[b j]]；
c} [I[；0]；I J；J:其中1计数'[I:值G]]/：\：[tV；值F]}；
q） piv[`tt；`date`sym`time；`exchange`buysell；登记`shares]
日期sym时间| BATS股票B BATS股票S纳斯达克股票B纳斯达克sha。。
---------------------| ------------------------------------------------------..
2009.01.05 adkk 09:30 | 577359499452。。
2009.01.05 adkk 09:45 | 882 501 339 467。。
2009.01.05 adkk 10:00 | 620 513 411 128。。
2009.01.05 adkk 10:15 | 501 544 272 544。。
2009.01.05 adkk 10:30 | 29159436331。。
2009.01.05 adkk 10:45 | 86750498536。。
2009.01.05 adkk 11:00 | 624632694493。。
2009.01.05 adkk 11:15 | 99 704 600 299。。
2009.01.05 adkk 11:30 | 269 394 280 392。。
2009.01.05 adkk 11:45 | 635 744 758 597。。
2009.01.05 adkk 12:00 | 562 354 498 405。。
2009.01.05 adkk 12:15 | 416437 303 492。。
2009.01.05 adkk 12:30 | 447 699 370 302。。
2009.01.05 adkk 12:45 | 336 647 512 245。。
2009.01.05 adkk 13:00 | 692457 497 553。。

我发现很难理解Ryan答案中原始的

piv

函数，因此我通过添加一些注释+更可读的变量名HTH对其进行了更新

piv:{[table; rows; columns; vals]
    
    / make sure args are lists
    vals: (),vals; 
    rows: (),rows;
    columns: (),columns; 

    / Get columns of table corresponding to those of row labels and calculate groups
    / group returns filteredValues dict whose keys are the unique row labels and vals are the row indices of each group e.g. (0 1 3; 2 4; ...)
    rowGroups: group rows#table;
    rowGroupIdxs: value rowGroups;
    rowValues: key[rowGroups];
    
    / Similarly, get columns of table corresponding to those of column labels and calculate groups
    colGroups: group columns#table;
    colGroupIdxs: value colGroups;
    colValues: key colGroups;
    
    getPivotCol: {[rowGroupStartIdx; nonSingleRowGroups; nonSingleRowGroupsIdx; vals; colGroupIdxs]
        / vals: the list of values for this particular value-column combination
        / colGroupIdxs: the list of indices for this particular column group
    
        / We only care about vals that should belong in this pivot column - we need to filter out vals not part of this column group
        filteredValues: count[vals]#vals[0N];
        filteredValues[colGroupIdxs]: vals[colGroupIdxs];
       
        / Equivalent to filteredValues <> 0N
        hasValue: count[vals]#0b;
        hasValue[colGroupIdxs]: 1b;
       
        / Seed off pivot column with the first (filtered) value of each row group
        / This will be correct for row groups of size 1 as no aggregation needs to occur
        pivotCol: filteredValues[rowGroupStartIdx];

        / Otherwise, for the row groups larger than 1, get the first (filtered) value
        pivotCol[nonSingleRowGroupsIdx]: first'[filteredValues[nonSingleRowGroups]@'where'[hasValue[nonSingleRowGroups]]];
        pivotCol
    }
    
    / Groups with more than 1 row (these are the ones that will need aggregating)
    nonSingleRowGroupsIdx: where 1 <> count'[rowGroupIdxs];
    
    / Get resulting pivot column for each combination of column and value fields
    pivotCols: raze getPivotCol[rowGroupIdxs[;0]; rowGroupIdxs[nonSingleRowGroupsIdx]; nonSingleRowGroupsIdx] /:\: [table[vals]; colGroupIdxs]
    
    / Columns names are the cross-product of column and value fields
    colNames:`${raze "" sv vals} each string raze (flip value flip colValues),'/:vals;
    
    / Finally, stitch together row and column headings with pivot data to obtain final table
    rowValues!flip colNames!pivotCols
};

piv:{[表；行；列；VAL]
/确保参数是列表
VAL:（），VAL；
行：（），行；
列：（），列；
/获取与行标签和计算组对应的表列
/组返回filteredValues dict，其键是唯一的行标签，VAL是每个组的行索引，例如（0 1 3；2 4；…）
行组：组行#表；
rowGroupIdxs：值行组；
rowValues：键[行组]；
/类似地，获取表中与列标签对应的列并计算组
colGroups：分组列#表；
colGroupIdxs：值colGroups；
colValues：键colgroup；
getPivotCol:{[rowGroupStartIdx；非SingleRowGroups；非SingleRowGroupSidX；VAL；colGroupIdxs]
/VAL：此特定值列组合的值列表
/colGroupIdxs：此特定列组的索引列表
/我们只关心应该属于此透视列的VAL—我们需要筛选出不属于此列组的VAL
FilteredValue:计数[VAL]#VAL[0N]；
filteredValues[colGroupIdxs]：vals[colGroupIdxs]；
/相当于FilteredValue 0N
hasValue:count[vals]#0b；
hasValue[colGroupIdxs]：1b；
/使用每个行组的第一个（已过滤）值对轴列进行种子设定
/这对于大小为1的行组是正确的，因为不需要进行聚合
pivotCol:filteredValues[rowGroupStartIdx]；
/否则，对于大于1的行组，获取第一个（已过滤）值
pivotCol[nonSingleRowGroupsIdx]：第一个'[filteredValues[nonSingleRowGroups]@'其中'[hasValue[nonSingleRowGroups]]]；
枢轴柱
}
/多行的组（这些是需要聚合的组）
非SingleRowGroupSidx：其中1个计数'[rowGroupIdxs]；
/获取列和值字段的每个组合的结果透视列
pivotCols:raze getPivotCol[rowGroupIdxs[；0]；rowGroupIdxs[NonSinglerRowGroupSidX]；NonSinglerRowGroupSidX]/：\：[table[vals]；colGroupIdxs]
/列名称是列字段和值字段的叉积
colNames:`${raze”“sv vals}每个字符串raze（flip value flip colValues），'/:vals；
/最后，将行标题和列标题与透视数据缝合在一起，以获得最终的表
行值！翻转列名称！透视列
};

我还根据自己的需要对列名的格式做了一个小小的更改，顺便说一句，我不确定是否只有我一个人，但是

piv

函数几乎感觉它是故意混淆的-我盯着它看了10分钟，仍然不知道它是如何工作的…干得好，mChen（当然是Ryan）-刚刚发送了一些拼写错误更正建议。在性能方面，比较两者，我得到如下结果：original:1128 416209312 yours:1121 416209312还有，在断开列名中添加空格后，p1~p2返回1b。

piv:{[table; rows; columns; vals]
    
    / make sure args are lists
    vals: (),vals; 
    rows: (),rows;
    columns: (),columns; 

    / Get columns of table corresponding to those of row labels and calculate groups
    / group returns filteredValues dict whose keys are the unique row labels and vals are the row indices of each group e.g. (0 1 3; 2 4; ...)
    rowGroups: group rows#table;
    rowGroupIdxs: value rowGroups;
    rowValues: key[rowGroups];
    
    / Similarly, get columns of table corresponding to those of column labels and calculate groups
    colGroups: group columns#table;
    colGroupIdxs: value colGroups;
    colValues: key colGroups;
    
    getPivotCol: {[rowGroupStartIdx; nonSingleRowGroups; nonSingleRowGroupsIdx; vals; colGroupIdxs]
        / vals: the list of values for this particular value-column combination
        / colGroupIdxs: the list of indices for this particular column group
    
        / We only care about vals that should belong in this pivot column - we need to filter out vals not part of this column group
        filteredValues: count[vals]#vals[0N];
        filteredValues[colGroupIdxs]: vals[colGroupIdxs];
       
        / Equivalent to filteredValues <> 0N
        hasValue: count[vals]#0b;
        hasValue[colGroupIdxs]: 1b;
       
        / Seed off pivot column with the first (filtered) value of each row group
        / This will be correct for row groups of size 1 as no aggregation needs to occur
        pivotCol: filteredValues[rowGroupStartIdx];

        / Otherwise, for the row groups larger than 1, get the first (filtered) value
        pivotCol[nonSingleRowGroupsIdx]: first'[filteredValues[nonSingleRowGroups]@'where'[hasValue[nonSingleRowGroups]]];
        pivotCol
    }
    
    / Groups with more than 1 row (these are the ones that will need aggregating)
    nonSingleRowGroupsIdx: where 1 <> count'[rowGroupIdxs];
    
    / Get resulting pivot column for each combination of column and value fields
    pivotCols: raze getPivotCol[rowGroupIdxs[;0]; rowGroupIdxs[nonSingleRowGroupsIdx]; nonSingleRowGroupsIdx] /:\: [table[vals]; colGroupIdxs]
    
    / Columns names are the cross-product of column and value fields
    colNames:`${raze "" sv vals} each string raze (flip value flip colValues),'/:vals;
    
    / Finally, stitch together row and column headings with pivot data to obtain final table
    rowValues!flip colNames!pivotCols
};