Stanford nlp 培训时文件溢出_xxxx.bin是什么意思

Stanford nlp 培训时文件溢出_xxxx.bin是什么意思,stanford-nlp,Stanford Nlp,我正在训练一个基于手套法的单词嵌入模型。而算法显示的记录器类似于: $ build/cooccur -memory 4.0 -vocab-file vocab.txt -verbose 2 -window-size 8 < /home/ignacio/data/GUsDany/corpus/GUs_regulon_pubMed.txt > cooccurrence.bin COUNTING COOCCURRENCES window size: 8 context: symmetric

我正在训练一个基于手套法的单词嵌入模型。而算法显示的记录器类似于:

$ build/cooccur -memory 4.0 -vocab-file vocab.txt -verbose 2 -window-size 8 < /home/ignacio/data/GUsDany/corpus/GUs_regulon_pubMed.txt > cooccurrence.bin
COUNTING COOCCURRENCES
window size: 8
context: symmetric
max product: 13752509
overflow length: 38028356
Reading vocab from file "vocab.txt"...loaded 145223095 words.
Building lookup table...table contains 228170143 elements.
Processing token: 5478600000
$build/cooccur-memory 4.0-vocab文件vocab.txt-verbose 2-window size 8coccurrence.bin
计数共现
窗口大小:8
上下文:对称
最高产品:13752509
溢流长度:38028356
从文件“vocab.txt”读取vocab…加载了145223095个单词。
构建查找表…表包含228170143个元素。
处理令牌:5478600000
手套的主目录中充满了文件caled
overflow\u 0534.bin
。有人能告诉我一切是否顺利吗

谢谢,一切都好

您可以在查看Glove cooccur程序的源代码

在文件的第57行:

long long overflow_length; // Number of cooccurrence records whose product exceeds max_product to store in memory before writing to disk
如果您的语料库中有太多的共现记录,那么将有一些数据写入一些temp-bin磁盘文件中

while (1) {
    if (ind >= overflow_length - window_size) { // If overflow buffer is (almost) full, sort it and write it to temporary file
        qsort(cr, ind, sizeof(CREC), compare_crec);
        write_chunk(cr,ind,foverflow);
        fclose(foverflow);
        fidcounter++;
        sprintf(filename,"%s_%04d.bin",file_head,fidcounter);
        foverflow = fopen(filename,"w");
        ind = 0;
    }
变量
overflow\u length
取决于内存设置

第463行:

if ((i = find_arg((char *)"-memory", argc, argv)) > 0) memory_limit = atof(argv[i + 1]);
第467行:

rlimit = 0.85 * (real)memory_limit * 1073741824/(sizeof(CREC));
第470行:

overflow_length = (long long) rlimit/6; // 0.85 + 1/6 ~= 1
一切都好

您可以在查看Glove cooccur程序的源代码

在文件的第57行:

long long overflow_length; // Number of cooccurrence records whose product exceeds max_product to store in memory before writing to disk
如果您的语料库中有太多的共现记录,那么将有一些数据写入一些temp-bin磁盘文件中

while (1) {
    if (ind >= overflow_length - window_size) { // If overflow buffer is (almost) full, sort it and write it to temporary file
        qsort(cr, ind, sizeof(CREC), compare_crec);
        write_chunk(cr,ind,foverflow);
        fclose(foverflow);
        fidcounter++;
        sprintf(filename,"%s_%04d.bin",file_head,fidcounter);
        foverflow = fopen(filename,"w");
        ind = 0;
    }
变量
overflow\u length
取决于内存设置

第463行:

if ((i = find_arg((char *)"-memory", argc, argv)) > 0) memory_limit = atof(argv[i + 1]);
第467行:

rlimit = 0.85 * (real)memory_limit * 1073741824/(sizeof(CREC));
第470行:

overflow_length = (long long) rlimit/6; // 0.85 + 1/6 ~= 1

谢谢你的回复。那么,如何避免大量文件阻止我训练>=300维的模型?@Nacho
溢出\u xxx.bin
文件是缓存文件,所以在生成
coccurrence.bin
时,您可以删除它们。如果您想避免这些文件,可能需要更多的ram。谢谢您的回复。那么,如何避免大量文件阻止我训练>=300维的模型?@Nacho
溢出\u xxx.bin
文件是缓存文件,所以在生成
coccurrence.bin
时,您可以删除它们。如果要避免这些文件,可能需要更多的ram。