MongoDB Java API慢速读取性能_Java_Mongodb_Performance_Mongodb Java_Mongo Java

MongoDB Java API慢速读取性能

java mongodb performance

MongoDB Java API慢速读取性能,java,mongodb,performance,mongodb-java,mongo-java,Java,Mongodb,Performance,Mongodb Java,Mongo Java,我们正在从一个本地MongoDB中阅读一个集合中的所有文档，性能不是很出色我们需要转储所有数据，不必担心为什么，只要相信它是真正需要的，并且没有任何解决办法我们有4mio文档，看起来像： { "_id":"4d094f58c96767d7a0099d49", "exchange":"NASDAQ", "stock_symbol":"AACC", "date":"2008-03-07", "open":8.4, "high":8.75,

我们正在从一个本地MongoDB中阅读一个集合中的所有文档，性能不是很出色

我们需要转储所有数据，不必担心为什么，只要相信它是真正需要的，并且没有任何解决办法

我们有4mio文档，看起来像：

{
    "_id":"4d094f58c96767d7a0099d49",
    "exchange":"NASDAQ",
    "stock_symbol":"AACC",
    "date":"2008-03-07",
    "open":8.4,
    "high":8.75,
    "low":8.08,
    "close":8.55,
    "volume":275800,
    "adj close":8.55
}

我们现在使用这个简单的代码来阅读：

MongoClient mongoClient = MongoClients.create();
MongoDatabase database = mongoClient.getDatabase("localhost");
MongoCollection<Document> collection = database.getCollection("test");

MutableInt count = new MutableInt();
long start = System.currentTimeMillis();
collection.find().forEach((Block<Document>) document -> count.increment() /* actually something more complicated */ );
long start = System.currentTimeMillis();

MongoClient-MongoClient=MongoClients.create（）；
MongoDatabase database=mongoClient.getDatabase（“localhost”）；
MongoCollection collection=database.getCollection（“测试”）；
MutableInt count=新的MutableInt（）；
长启动=System.currentTimeMillis（）；
collection.find（）.forEach（（Block）document->count.increment（）/*实际上是更复杂的东西*/）；
长启动=System.currentTimeMillis（）；

我们以16秒（250k行/秒）的速度阅读整个集合，对于小文档来说，这一点都不令人印象深刻。请记住，我们希望加载8亿行。不可能使用聚合、map reduce或类似方法

这是否与MongoDB的速度一样快，或者是否有其他方法可以更快地加载文档（其他技术、移动Linux、更多RAM、设置等等）

collection.find（）.forEach（（块）文档->count.increment（））
由于您在内存中迭代了250k条记录，因此此行可能会占用大量时间
要快速检查情况是否属实，您可以尝试以下方法-
long start1 = System.currentTimeMillis();
List<Document> documents = collection.find();
System.out.println(System.currentTimeMillis() - start1);

long start2 = System.currentTimeMillis();
documents.forEach((Block<Document>) document -> count.increment());
System.out.println(System.currentTimeMillis() - start2);

long start1=System.currentTimeMillis（）；
列表文档=collection.find（）；
System.out.println（System.currentTimeMillis（）-start1）；
long start2=System.currentTimeMillis（）；
documents.forEach（（块）文档->count.increment（））；
System.out.println（System.currentTimeMillis（）-start2）；

这将帮助您了解从数据库获取文档实际需要多少时间，以及迭代需要多少时间。
您没有指定用例，因此很难告诉您如何优化查询。（即：谁想一次加载800mil行仅用于计数？）
考虑到您的模式，我认为您的数据几乎是只读的，并且您的任务与数据聚合相关
您当前的工作就是读取数据（很可能您的驱动程序会成批读取），然后停止，然后执行一些计算（是的，使用int包装来增加处理时间），然后重复。这不是一个好办法。如果您没有以正确的方式访问数据库，它不会神奇地快
如果计算不太复杂，我建议您使用，而不是加载到您的RAM的所有
只是你应该考虑提高你的聚集度：
将数据集划分为较小的集合。（例如：按日期划分，按交换划分）。添加索引以支持该分区，并在分区上操作聚合，然后合并结果（典型的分而治之方法）
项目只需要字段
过滤掉不必要的文档（如果可能）
如果无法在内存上执行聚合（如果达到每个管道100MB的限制），则允许使用磁盘
使用内置管道加速计算（例如：$count
）
如果您的计算过于复杂，无法用聚合框架表示，则使用。它在mongod
过程中运行，数据不需要通过网络传输到内存中
已更新
所以看起来你们想做一个OLAP处理，而你们只是停留在ETL阶段
您不需要也不必每次都将整个OLTP数据加载到OLAP。只需将新更改加载到数据仓库。然后，第一次数据加载/转储需要更多的时间，这是正常且可接受的
首次加载时，应考虑以下几点：
Divide-N-Conquer再次将数据分解为更小的数据集（使用诸如日期/交易所/股票标签之类的谓词…）
执行并行计算，然后合并结果（必须正确划分数据集）
在forEach
中批量计算而不是处理：加载数据分区，然后进行计算而不是逐个计算
在您的案例中，我认为我应该做的是一个简单的解决方案，同时一个有效的方法是通过使用parallelCollectionScan最大化总体吞吐量
允许应用程序在读取所有数据时使用多个并行游标
从集合中删除文档，从而提高吞吐量。这个
parallelCollectionScan命令返回包含
光标信息数组
每一个游标都提供了对返回一部分
收藏中的文件。迭代每个游标将返回每个
收藏中的文档。游标不包含
数据库命令。数据库命令的结果标识
游标，但不包含或构成游标
一个简单的例子应该是这样的
 MongoClient mongoClient = MongoClients.create();
 MongoDatabase database = mongoClient.getDatabase("localhost");
 Document commandResult = database.runCommand(new Document("parallelCollectionScan", "collectionName").append("numCursors", 3));

据我统计，您正在处理大约50MIB/s（250k行/秒*0.2kib/行）。这正进入磁盘驱动器和网络瓶颈领域。MongoDB使用什么类型的存储？客户端和MongoDB服务器之间的带宽是多少？您是否尝试过在高速（>=10 Gib/s）网络上以最小（<1.0 ms）延迟将服务器和客户端共同定位？请记住，如果您使用的是AWS或GCP等云计算提供商，那么他们将面临物理瓶颈之上的虚拟化瓶颈
您询问了可能有帮助的设置。您可以尝试更改和上的压缩设置（选项有“无”、“snappy

和

zlib

）。即使两者都没有改善snappy，观察设置所产生（或不产生）的差异可能有助于确定系统的哪个部分承受的压力最大

Java对于数字cr没有很好的性能

collection.find().forEach((Block<Document>) document -> count.increment());

ReadPreference readPref = ReadPreference.primary();
ReadConcern concern = ReadConcern.DEFAULT;
MongoNamespace ns = new MongoNamespace(databaseName,collectionName);
Decoder<Document> codec = new DocumentCodec();
FindOperation<Document> fop = new FindOperation<Document>(ns,codec);
ReadWriteBinding readBinding = new ClusterBinding(getCluster(), readPref, concern);
QueryBatchCursor<Document> cursor = (QueryBatchCursor<Document>) fop.execute(readBinding);
AtomicInteger count = new AtomicInteger(0);
try (MongoBatchCursorAdapter<Document> cursorAdapter = new MongoBatchCursorAdapter<Document>(cursor)) {
    while (cursorAdapter.hasNext()) {
        Document doc = cursorAdapter.next();
        count.incrementAndGet();
    }
}

ReadPreference readPref = ReadPreference.primary();
ReadConcern concern = ReadConcern.DEFAULT;
MongoNamespace ns = new MongoNamespace(databaseName,collectionName);
FieldNameValidator noOpValidator = new NoOpFieldNameValidator();
DocumentCodec payloadDecoder = new DocumentCodec();
Constructor<CodecProvider> providerConstructor = (Constructor<CodecProvider>) Class.forName("com.mongodb.operation.CommandResultCodecProvider").getDeclaredConstructor(Decoder.class, List.class);
providerConstructor.setAccessible(true);
CodecProvider firstBatchProvider = providerConstructor.newInstance(payloadDecoder, Collections.singletonList("firstBatch"));
CodecProvider nextBatchProvider = providerConstructor.newInstance(payloadDecoder, Collections.singletonList("nextBatch"));
Codec<BsonDocument> firstBatchCodec = fromProviders(Collections.singletonList(firstBatchProvider)).get(BsonDocument.class);
Codec<BsonDocument> nextBatchCodec = fromProviders(Collections.singletonList(nextBatchProvider)).get(BsonDocument.class);
ReadWriteBinding readBinding = new ClusterBinding(getCluster(), readPref, concern);
BsonDocument find = new BsonDocument("find", new BsonString(collectionName));
Connection conn = readBinding.getReadConnectionSource().getConnection();

BsonDocument results = conn.command(databaseName,find,noOpValidator,readPref,firstBatchCodec,readBinding.getReadConnectionSource().getSessionContext(), true, null, null);
BsonDocument cursor = results.getDocument("cursor");
long cursorId = cursor.getInt64("id").longValue();

BsonArray firstBatch = cursor.getArray("firstBatch");

private Cluster getCluster() {
    Field cluster, delegate;
    Cluster mongoCluster = null;
    try {
        delegate = mongoClient.getClass().getDeclaredField("delegate");
        delegate.setAccessible(true);
        Object clientDelegate = delegate.get(mongoClient);
        cluster = clientDelegate.getClass().getDeclaredField("cluster");
        cluster.setAccessible(true);
        mongoCluster = (Cluster) cluster.get(clientDelegate);
    } catch (NoSuchFieldException | SecurityException | IllegalArgumentException | IllegalAccessException e) {
        System.err.println(e.getClass().getName()+" "+e.getMessage());
    }
    return mongoCluster;
}