Optimization 将数据插入BerkeleyDB JE的速度越来越慢

Optimization 将数据插入BerkeleyDB JE的速度越来越慢,optimization,insert,berkeley-db,berkeley-db-je,Optimization,Insert,Berkeley Db,Berkeley Db Je,我正试图在伯克利的JE中插入~56249000个项目。我运行了DbCacheSize以获取有关我的数据库的一些统计信息: java -jar je-5.0.34.jar DbCacheSize -records 56248699 -key 8 -data 20 === Environment Cache Overhead === 3,155,957 minimum bytes To account for JE daemon operation and record locks, a

我正试图在伯克利的JE中插入~56249000个项目。我运行了DbCacheSize以获取有关我的数据库的一些统计信息:

java -jar je-5.0.34.jar  DbCacheSize -records 56248699 -key 8 -data 20 

=== Environment Cache Overhead ===

3,155,957 minimum bytes

To account for JE daemon operation and record locks,
a significantly larger amount is needed in practice.

=== Database Cache Size ===

 Minimum Bytes    Maximum Bytes   Description
---------------  ---------------  -----------
  1,287,110,736    1,614,375,504  Internal nodes only
  4,330,861,264    4,658,126,032  Internal nodes and leaf nodes

=== Internal Node Usage by Btree Level ===

 Minimum Bytes    Maximum Bytes      Nodes    Level
---------------  ---------------  ----------  -----
  1,269,072,064    1,592,660,160     632,008    1
     17,837,712       21,473,424       7,101    2
        198,448          238,896          79    3
          2,512            3,024           1    4
我两年前问过这个问题,但我仍然不确定应该如何根据这些统计数据配置我的环境

加载数据时,我将是唯一有权访问数据库的用户:我应该使用事务吗

“我的环境”当前已打开,如下所示:

EnvironmentConfig cfg=(...)
cfg.setTransactional(true);
cfg.setAllowCreate(true);
cfg.setReadOnly(false);
cfg.setCachePercent(80);
cfg.setConfigParam(EnvironmentConfig.LOG_FILE_MAX,"250000000");
数据库:

java -jar je-5.0.34.jar  DbCacheSize -records 56248699 -key 8 -data 20 

=== Environment Cache Overhead ===

3,155,957 minimum bytes

To account for JE daemon operation and record locks,
a significantly larger amount is needed in practice.

=== Database Cache Size ===

 Minimum Bytes    Maximum Bytes   Description
---------------  ---------------  -----------
  1,287,110,736    1,614,375,504  Internal nodes only
  4,330,861,264    4,658,126,032  Internal nodes and leaf nodes

=== Internal Node Usage by Btree Level ===

 Minimum Bytes    Maximum Bytes      Nodes    Level
---------------  ---------------  ----------  -----
  1,269,072,064    1,592,660,160     632,008    1
     17,837,712       21,473,424       7,101    2
        198,448          238,896          79    3
          2,512            3,024           1    4
cfg.setAllowCreate(true);
cfg.setTransactional(true);
cfg.setReadOnly(false);
我通过以下方式阅读/插入项目:

Transaction txn= env.beginTransaction(null, null);
//open db with transaction 'txn'
Database db=env.open(...txn)

Transaction txn2=this.getEnvironment().beginTransaction(null, null);
long record_id=0L;
while((item=readNextItem(input))!=null)
    {
    (...)
    ++record_id;

    db.put(...); //insert record_id/item into db
    /** every 100000 items commit and create a new transaction.
       I found it was the only way to avoid an outOfMemory exception */
    if(record_id%100000==0)
        {
        txn2.commit();
        System.gc();
        txn2=this.getEnvironment().beginTransaction(null, null);
        }
    }

txn2.commit();
txn.commit();
但是事情变得越来越慢。我从eclipse运行程序,没有为JVM设置任何内容

100000 / 56248699 ( 0.2 %).  13694.9 records/seconds.  Time remaining:68.3 m Disk Usage: 23.4 Mb. Expect Disk Usage: 12.8 Gb Free Memory : 318.5 Mb.
200000 / 56248699 ( 0.4 %).  16680.6 records/seconds.  Time remaining:56.0 m Disk Usage: 49.5 Mb. Expect Disk Usage: 13.6 Gb Free Memory : 338.3 Mb.
(...)
6600000 / 56248699 (11.7 %).  9658.2 records/seconds.  Time remaining:85.7 m Disk Usage: 2.9 Gb. Expect Disk Usage: 24.6 Gb Free Memory : 165.0 Mb.
6700000 / 56248699 (11.9 %).  9474.5 records/seconds.  Time remaining:87.2 m Disk Usage: 2.9 Gb. Expect Disk Usage: 24.7 Gb Free Memory : 164.8 Mb.
6800000 / 56248699 (12.1 %).  9322.6 records/seconds.  Time remaining:88.4 m Disk Usage: 3.0 Gb. Expect Disk Usage: 24.8 Gb Free Memory : 164.8 Mb.
(Ctrl-C... abort...)
我怎样才能让事情变得更快

更新:

MemTotal:        4021708 kB
MemFree:          253580 kB
Buffers:           89360 kB
Cached:          1389272 kB
SwapCached:           56 kB
Active:          2228712 kB
Inactive:        1449096 kB
Active(anon):    1793592 kB
Inactive(anon):   596852 kB
Active(file):     435120 kB
Inactive(file):   852244 kB
Unevictable:           0 kB
Mlocked:               0 kB
HighTotal:       3174028 kB
HighFree:          57412 kB
LowTotal:         847680 kB
LowFree:          196168 kB
SwapTotal:       4085756 kB
SwapFree:        4068224 kB
Dirty:             16320 kB
Writeback:             0 kB
AnonPages:       2199056 kB
Mapped:           111280 kB
Shmem:            191272 kB
Slab:              58664 kB
SReclaimable:      41448 kB
SUnreclaim:        17216 kB
KernelStack:        3792 kB
PageTables:        11328 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     6096608 kB
Committed_AS:    5069728 kB
VmallocTotal:     122880 kB
VmallocUsed:       18476 kB
VmallocChunk:      81572 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       10232 kB
DirectMap2M:      903168 kB
更新2:

Max. Heap Size (Estimated): 872.94M
Ergonomics Machine Class: server
Using VM: Java HotSpot(TM) Server VM
更新3:

根据Jerven的建议,我得到了以下性能:

    (...)
    6800000 / 56248699 (12.1 %).  13144.8 records/seconds.  Time remaining:62.7 m Disk Usage: 1.8 Gb. Expect Disk Usage: 14.6 Gb Free Memory : 95.5 Mb.
    (...)
与我之前的结果相比:

6800000 / 56248699 (12.1 %).  9322.6 records/seconds.  Time remaining:88.4 m Disk Usage: 3.0 Gb. Expect Disk Usage: 24.8 Gb Free Memory : 164.8 Mb.

首先,我将删除您对System.gc()的显式调用; 如果您注意到这有助于考虑到不同的GC算法。例如,当bdb/je缓存使用率始终接近可用堆的70%时,G1GC的性能会更好

第二,在某个时刻,B+索引更新将是n log n性能,并将减少插入时间

不使用事务将更快。特别是,如果导入失败,您可以从头开始重新启动导入

只需记住在最后执行environment.sync()和检查点。在执行此导入时,您可能希望禁用BDB/je检查点和BDB/je GC线程

config.setConfigParam(EnvironmentConfig.ENV_RUN_CLEANER,  "false");
config.setConfigParam(EnvironmentConfig.ENV_RUN_CHECKPOINTER, "false);
config.setConfigParam(EnvironmentConfig.ENV_RUN_IN_COMPRESSOR, "false");
加载之后,您应该调用这样的方法

public void checkpointAndSync()
    throws ObjectStoreException
{
            env.sync();
    CheckpointConfig force = new CheckpointConfig();
    force.setForce(true);
    try
    {
        env.checkpoint(force);
    } catch (DatabaseException e)
    {
        log.error("Can not chekpoint db " + path.getAbsolutePath(), e);
        throw new ObjectStoreException(e);
    }
}

你可以考虑,也可以打开。

对于其余部分,您的内部节点缓存大小应至少为1.6GB,这意味着首先要有大于2GB的堆


也可以考虑合并记录。例如,如果键自然递增,则可以在一个键下存储16个值。但是,如果你认为这是一种有趣的方法,你可以从这里开始。

你可以添加你的JVM和机器详细信息吗。Java(TM)SE运行时环境(build 1.7.0_07-b10)Linux名称3.2.0-38-generic-pae#60 Ubuntu SMP Wed Feb 13:47:26 UTC 2013 i686 i686 i386 GNU/Linux你还可以显示Java堆有多少可用内存吗?快速查看一下,您应该有1.6Gig的缓存,这意味着您应该有2gig的堆来启动。打开键前缀可能会减少这一点,就像记录合并一样(仅当键自然递增时)。已更新。(是的,键递增)。非常感谢。