Kdb 八字台向上插入导致错误:`cast

Kdb 八字台向上插入导致错误:`cast,kdb,torq,Kdb,Torq,我构建了一个数据加载器原型,将CSV保存到八字表中。工作流程如下: 第一次创建架构,例如volatitysurface表: volatilitysurface::([date:`datetime$(); ccypair:`symbol$()] atm_convention:`symbol$(); premium_included:`boolean$(); smile_type:`symbol$(); vs_type:`symbol$(); delta_ratio:`float$(); delta

我构建了一个数据加载器原型,将CSV保存到八字表中。工作流程如下:

  • 第一次创建架构,例如
    volatitysurface
    表:

    volatilitysurface::([date:`datetime$(); ccypair:`symbol$()] atm_convention:`symbol$(); premium_included:`boolean$(); smile_type:`symbol$(); vs_type:`symbol$(); delta_ratio:`float$(); delta_setting:`float$(); wing_extrapolation:`float$(); spread_type:`symbol$());
    
  • 对于rawdata文件夹中的每个文件,请导入它:

    myfiles:@[system;"dir /b /o:gn ",string `$getenv[`KDBRAWDATA],"*.volatilitysurface.csv 2> nul";()];
    if[myfiles~();.lg.o[`load;"no volatilitysurface files found!"];:0N];
    .lg.o[`load;"loading data files ..."];
    / load each file
    {
      mypath:"" sv (string `$getenv[`KDBRAWDATA];x);
      .lg.o[`load;"loading file name '",mypath,"' ..."];
      myfile:hsym`$mypath;
      tmp1:select date,ccypair,atm_convention,premium_included,smile_type,vs_type,delta_ratio,delta_setting,wing_extrapolation,spread_type from update date:x, premium_included:?[premium_included = `$"true";1b;0b] from ("ZSSSSSFFFS";enlist ",")0:myfile;
      `volatilitysurface upsert tmp1;
    } @/: myfiles;
    delete tmp1 from `.;
    .Q.gc[];
    .lg.o[`done;"loading volatilitysurface data done"];
    
    .lg.o[`save;"saving volatilitysurface schema to ",string afolder];
    volatilitysurface::0!volatilitysurface;
    .Q.dpft[afolder;`;`ccypair;`volatilitysurface];
    .lg.o[`cleanup;"removing volatilitysurface from memory"];
    delete volatilitysurface from `.;
    .Q.gc[];
    .lg.o[`done;"saving volatilitysurface schema done"];
    
  • 这很好用。我使用
    .Q.gc[]频繁,以避免点击
    wsfull
    。当新的CSV文件可用时,我打开现有的模式,向上插入并再次保存,有效地覆盖现有的HDB文件系统

  • 打开模式:

    .lg.o[`open;"tables already exists, opening the schema ..."];
    @[system;"l ",(string afolder) _ 0;{.lg.e[`open;"failed to load hdb directory: ", x]; 'x}];
    / Re-create table index
    volatilitysurface::`date`ccypair xkey select from volatilitysurface;
    
  • 重新运行步骤#2,将新的CSV文件附加到现有的
    volatitySurface
    表中,它会完美地向上插入第一个CSV,但第二个CSV失败,原因是:

    error: `cast
    
  • 我调试到错误点,并仔细检查
    tmp1
    volatitysurface
    的元数据是否完全相同。知道为什么会这样吗?我对其他桌子也有同样的问题。每次插入后,我都试着清理桌子上的钥匙,但没有帮助

    volatilitysurface::0!volatilitysurface;
    volatilitysurface::`date`ccypair xkey volatilitysurface;
    
    以及转换错误点处的元数据比较:

    meta tmp1
    c                 | t f a
    ------------------| -----
    date              | z    
    ccypair           | s    
    atm_convention    | s    
    premium_included  | b    
    smile_type        | s    
    vs_type           | s    
    delta_ratio       | f    
    delta_setting     | f    
    wing_extrapolation| f    
    spread_type       | s
    
    meta volatilitysurface
    c                 | t f a
    ------------------| -----
    date              | z    
    ccypair           | s   p
    atm_convention    | s    
    premium_included  | b    
    smile_type        | s    
    vs_type           | s    
    delta_ratio       | f    
    delta_setting     | f    
    wing_extrapolation| f    
    spread_type       | s   
    
    更新使用下面答案的输入,我尝试使用Torq的
    .loader.loadeAllFiles
    函数,如下所示(它不会失败,但也不会发生任何事情,表不会在内存中创建,数据也不会写入数据库):

    UDPATE2这是我从TorQ获得的输出:

    2017.11.20D08:46:12.550618000|wsp18497wn|dataloader|dataloader1|INF|dataloader|**** LOADING :rawdata/20171102_113420.disccurve.csv ****
    2017.11.20D08:46:12.550618000|wsp18497wn|dataloader|dataloader1|INF|dataloader|reading in data chunk
    2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|Read 10000 rows
    2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|processing data
    2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|Enumerating
    2017.11.20D08:46:12.566218000|wsp18497wn|dataloader|dataloader1|INF|dataloader|writing 4525 rows to :hdb/2017.09.12/volatilitysurface/
    2017.11.20D08:46:12.581819000|wsp18497wn|dataloader|dataloader1|INF|dataloader|writing 4744 rows to :hdb/2017.09.13/volatilitysurface/
    2017.11.20D08:46:12.659823000|wsp18497wn|dataloader|dataloader1|INF|dataloader|writing 731 rows to :hdb/2017.09.14/volatilitysurface/
    2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|init|retrieving sort settings from :C:/Dev/torq//config/sort.csv
    2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|sort|sorting the volatilitysurface table
    2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|sorttab|No sort parameters have been specified for : volatilitysurface. Using default parameters
    2017.11.20D08:46:12.737827000|wsp18497wn|dataloader|dataloader1|INF|sortfunction|sorting :hdb/2017.09.05/volatilitysurface/ by these columns : sym, time
    2017.11.20D08:46:12.753428000|wsp18497wn|dataloader|dataloader1|ERR|sortfunction|failed to sort :hdb/2017.09.05/volatilitysurface/ by these columns : sym, time.  The error was: hdb/2017.09.
    
    我得到以下错误
    sorttab |没有为:volatilitysurface指定任何排序参数。使用默认参数
    此SORTAB记录在哪里?默认情况下是否使用表PK

    UPDATE3Ok通过在我的
    config
    文件夹下提供非默认的
    sort.csv
    修复了UPDATE2:

    tabname,att,column,sort
    default,p,sym,1
    default,,time,1
    volatilitysurface,,date,1
    volatilitysurface,,ccypair,1
    
    但是现在我看到,如果我在同一个文件上多次调用该函数,它只会附加重复的数据,而不是
    upsert
    ing它

    更新4仍不存在。。。假设我可以检查以确保没有使用重复的文件。当我加载并启动数据库时,我会得到一些结构,这些结构包含某种字典而不是表

    2017.10.31| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
    2017.11.01| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
    2017.11.02| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
    2017.11.03| (,`volatilitysurface)!,+`date`ccypair`atm_convention`premium_incl..
    sym       | `AUDNOK`AUDCNH`AUDJPY`AUDHKD`AUDCHF`AUDSGD`AUDCAD`AUDDKK`CADSGD`C..
    
    请注意,date实际上是datetime Z,而不仅仅是date。我的完整和最新版本的函数调用是:

    target:hsym `$("" sv ("./";getenv[`KDBHDB];"/volatilitysurface"));
    rawdatadir:hsym `$getenv[`KDBRAWDATA];
    .loader.loadallfiles[`headers`types`separator`tablename`dbdir`partitioncol`dataprocessfunc!(`x`ccypair`atm_convention`premium_included`smile_type`vs_type`delta_ratio`delta_setting`wing_extrapolation`spread_type;"ZSSSSSFFFS";enlist ",";`volatilitysurface;target;`date;{[p;t] select date,ccypair,atm_convention,premium_included,smile_type,vs_type,delta_ratio,delta_setting,wing_extrapolation,spread_type from update date:x, premium_included:?[premium_included = `$"true";1b;0b] from t}); rawdatadir];
    

    `强制转换错误是指未枚举的值

    我在这里看不到任何枚举,磁盘上的八字表需要枚举符号列。例如,在调用.Q.dpft之前,可以使用以下行完成此操作

    volatilitysurface:.Q.en[afolder;volatilitysurface];
    

    您可能会考虑使用示例CSV加载器加载数据。其中一个例子包含在TorQ中,这是AquaQ Analytics开发的KDB框架(作为免责声明,我为AquaQ工作)

    该框架可在以下位置获得(免费):

    您可能会感兴趣的特定组件是dataloader.q,并记录在这里:


    此脚本将处理所有必要的操作,加载所有文件、枚举、磁盘排序、应用属性等,并使用以防止内存耗尽

    我将在此处添加第二个答案,以尝试解决有关使用TorQ数据加载器的问题

    我想澄清一下,运行此函数后,您得到了什么输出?应该有一些日志信息输出,你能发布这些吗?例如,当我运行函数时:

    jmcmurray@homer ~/deploy/TorQ (master) $ q torq.q -procname loader -proctype loader -debug
    <torq startup messages removed>
    q).loader.loadallfiles[`headers`types`separator`tablename`dbdir`partitioncol`dataprocessfunc!(c;"TSSFJFFJJBS";enlist",";`quotes;`:testdb;`date;{[p;t] select date:.z.d,time:TIME,sym:INSTRUMENT,BID,ASK from t});`:csvtest]
    2017.11.17D15:03:20.312336000|homer.aquaq.co.uk|loader|loader|INF|dataloader|**** LOADING :csvtest/tradesandquotes20140421.csv ****
    2017.11.17D15:03:20.319110000|homer.aquaq.co.uk|loader|loader|INF|dataloader|reading in data chunk
    2017.11.17D15:03:20.339414000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Read 11000 rows
    2017.11.17D15:03:20.339463000|homer.aquaq.co.uk|loader|loader|INF|dataloader|processing data
    2017.11.17D15:03:20.339519000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Enumerating
    2017.11.17D15:03:20.340061000|homer.aquaq.co.uk|loader|loader|INF|dataloader|writing 11000 rows to :testdb/2017.11.17/quotes/
    2017.11.17D15:03:20.341669000|homer.aquaq.co.uk|loader|loader|INF|dataloader|**** LOADING :csvtest/tradesandquotes20140422.csv ****
    2017.11.17D15:03:20.349606000|homer.aquaq.co.uk|loader|loader|INF|dataloader|reading in data chunk
    2017.11.17D15:03:20.370793000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Read 11000 rows
    2017.11.17D15:03:20.370858000|homer.aquaq.co.uk|loader|loader|INF|dataloader|processing data
    2017.11.17D15:03:20.370911000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Enumerating
    2017.11.17D15:03:20.371441000|homer.aquaq.co.uk|loader|loader|INF|dataloader|writing 11000 rows to :testdb/2017.11.17/quotes/
    2017.11.17D15:03:20.460118000|homer.aquaq.co.uk|loader|loader|INF|init|retrieving sort settings from :/home/jmcmurray/deploy/TorQ/config/sort.csv
    2017.11.17D15:03:20.466690000|homer.aquaq.co.uk|loader|loader|INF|sort|sorting the quotes table
    2017.11.17D15:03:20.466763000|homer.aquaq.co.uk|loader|loader|INF|sorttab|No sort parameters have been specified for : quotes. Using default parameters
    2017.11.17D15:03:20.466820000|homer.aquaq.co.uk|loader|loader|INF|sortfunction|sorting :testdb/2017.11.17/quotes/ by these columns : sym, time
    2017.11.17D15:03:20.527216000|homer.aquaq.co.uk|loader|loader|INF|applyattr|applying p attr to the sym column in :testdb/2017.11.17/quotes/
    2017.11.17D15:03:20.535095000|homer.aquaq.co.uk|loader|loader|INF|sort|finished sorting the quotes table
    
    jmcmurray@homer~/deploy/TorQ(master)$q TorQ.q-procname loader-proctype loader-debug
    q) .loader.loadallfiles[`headers`types`separator`tablename`dbdir`partitioncol`dataprocessfunc!(c;“TSSFJFFJJBS”;include“,”;`quotes;`:testdb;`date;{[p;t]选择日期:$z.d,时间:时间,符号:仪器,出价,从t});`csvtest]
    2017.11.17D15:03:20.312336000 | homer.aquaq.co.uk | loader | loader | INF | dataloader |****加载:csvtest/tradesandquotes 20140421.csv****
    2017.11.17D15:03:20.319110000 | homer.aquaq.co.uk | loader | loader | INF | dataloader |读取数据块
    2017.11.17D15:03:20.339414000 | homer.aquaq.co.uk | loader | loader | INF | dataloader |读取11000行
    2017.11.17D15:03:20.339463000 | homer.aquaq.co.uk | loader | loader | INF | dataloader |处理数据
    2017.11.17D15:03:20.339519000 | homer.aquaq.co.uk | loader | loader | INF | dataloader |枚举
    2017.11.17D15:03:20.34061000 | homer.aquaq.co.uk | loader | loader | INF | dataloader |将11000行写入:testdb/2017.11.17/quotes/
    2017.11.17D15:03:20.341669000 | homer.aquaq.co.uk | loader | loader | INF | dataloader |****加载:csvtest/tradesandquotes20140422.csv****
    2017.11.17D15:03:20.349606000 | homer.aquaq.co.uk | loader | loader | INF | dataloader |读取数据块
    2017.11.17D15:03:20.370793000 | homer.aquaq.co.uk | loader | loader | INF | dataloader |读取11000行
    2017.11.17D15:03:20.370858000 | homer.aquaq.co.uk | loader | loader | INF | dataloader |处理数据
    2017.11.17D15:03:20.370911000 | homer.aquaq.co.uk | loader | loader | INF | dataloader |枚举
    2017.11.17D15:03:20.371441000 | homer.aquaq.co.uk | loader | loader | INF | dataloader |将11000行写入:testdb/2017.11.17/quotes/
    2017.11.17D15:03:20.460118000 | homer.aquaq.co.uk | loader | loader | INF | init |从以下位置检索排序设置:/home/jmcmurray/deploy/TorQ/config/sort.csv
    2017.11.17D15:03:20.466690000 | homer.aquaq.co.uk | loader | loader | INF | sort |排序报价表
    2017.11.17D15:03:20.466763000 | homer.aquaq.co.uk | loader | loader | INF | sorttab |未为以下内容指定排序参数:引号。使用默认参数
    2017.11.17D15:03:20.466820000 | homer.aquaq.co.uk | loader | loader | INF | sortfunction |排序:testdb/2017.11.17/quotes/按以下列:sym,time
    2017.11.17D15:03:20.527216000 | homer.aquaq.co.uk | loader | loader | INF | applyattr |将p attr应用于:testdb/2017.11.17/quotes中的sym列/
    2017.11.17D15:03:20.535095000 | homer.aquaq.co.uk | loader | loader | INF | sort |完成对报价表的排序
    
    在完成所有这些之后,我可以运行
    \l testdb
    ,并且有一个名为“quotes”的表包含我加载的数据

    如果您可以发布这样的日志消息,它
    jmcmurray@homer ~/deploy/TorQ (master) $ q torq.q -procname loader -proctype loader -debug
    <torq startup messages removed>
    q).loader.loadallfiles[`headers`types`separator`tablename`dbdir`partitioncol`dataprocessfunc!(c;"TSSFJFFJJBS";enlist",";`quotes;`:testdb;`date;{[p;t] select date:.z.d,time:TIME,sym:INSTRUMENT,BID,ASK from t});`:csvtest]
    2017.11.17D15:03:20.312336000|homer.aquaq.co.uk|loader|loader|INF|dataloader|**** LOADING :csvtest/tradesandquotes20140421.csv ****
    2017.11.17D15:03:20.319110000|homer.aquaq.co.uk|loader|loader|INF|dataloader|reading in data chunk
    2017.11.17D15:03:20.339414000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Read 11000 rows
    2017.11.17D15:03:20.339463000|homer.aquaq.co.uk|loader|loader|INF|dataloader|processing data
    2017.11.17D15:03:20.339519000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Enumerating
    2017.11.17D15:03:20.340061000|homer.aquaq.co.uk|loader|loader|INF|dataloader|writing 11000 rows to :testdb/2017.11.17/quotes/
    2017.11.17D15:03:20.341669000|homer.aquaq.co.uk|loader|loader|INF|dataloader|**** LOADING :csvtest/tradesandquotes20140422.csv ****
    2017.11.17D15:03:20.349606000|homer.aquaq.co.uk|loader|loader|INF|dataloader|reading in data chunk
    2017.11.17D15:03:20.370793000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Read 11000 rows
    2017.11.17D15:03:20.370858000|homer.aquaq.co.uk|loader|loader|INF|dataloader|processing data
    2017.11.17D15:03:20.370911000|homer.aquaq.co.uk|loader|loader|INF|dataloader|Enumerating
    2017.11.17D15:03:20.371441000|homer.aquaq.co.uk|loader|loader|INF|dataloader|writing 11000 rows to :testdb/2017.11.17/quotes/
    2017.11.17D15:03:20.460118000|homer.aquaq.co.uk|loader|loader|INF|init|retrieving sort settings from :/home/jmcmurray/deploy/TorQ/config/sort.csv
    2017.11.17D15:03:20.466690000|homer.aquaq.co.uk|loader|loader|INF|sort|sorting the quotes table
    2017.11.17D15:03:20.466763000|homer.aquaq.co.uk|loader|loader|INF|sorttab|No sort parameters have been specified for : quotes. Using default parameters
    2017.11.17D15:03:20.466820000|homer.aquaq.co.uk|loader|loader|INF|sortfunction|sorting :testdb/2017.11.17/quotes/ by these columns : sym, time
    2017.11.17D15:03:20.527216000|homer.aquaq.co.uk|loader|loader|INF|applyattr|applying p attr to the sym column in :testdb/2017.11.17/quotes/
    2017.11.17D15:03:20.535095000|homer.aquaq.co.uk|loader|loader|INF|sort|finished sorting the quotes table