Node.js 向Firestore写入大量文档的最快方式是什么？_Node.js_Firebase_Google Cloud Firestore

Node.js 向Firestore写入大量文档的最快方式是什么？

node.js firebase google-cloud-firestore

Node.js 向Firestore写入大量文档的最快方式是什么？,node.js,firebase,google-cloud-firestore,Node.js,Firebase,Google Cloud Firestore,我需要向Firestore编写大量文档在Node.js中，最快的方法是什么；DR：在Firestore上执行批量日期创建的最快方法是执行并行的单个写入操作。向Firestore写入1000个文档需要： ~105.4s在使用顺序单独写入操作时 ~2.8s使用（2）批处理写入操作时 ~1.5s使用并行单独写入操作时在Firestore上执行大量写操作有三种常见方法按顺序执行每个单独的写入操作使用批处理写入操作并行执行单个写操作我们将在下面使用一系列随机文档数据依次调查每一项单个顺

我需要向Firestore编写大量文档

在Node.js中，最快的方法是什么；DR：在Firestore上执行批量日期创建的最快方法是执行并行的单个写入操作。

向Firestore写入1000个文档需要：

~105.4s

在使用顺序单独写入操作时

~2.8s

使用（2）批处理写入操作时

~1.5s

使用并行单独写入操作时

在Firestore上执行大量写操作有三种常见方法

按顺序执行每个单独的写入操作

使用批处理写入操作

并行执行单个写操作

我们将在下面使用一系列随机文档数据依次调查每一项

单个顺序写入操作这是最简单的解决方案：

异步函数testSequentialIndividualWrites（数据）{ while（数据长度）{ wait collection.add（datas.shift（））； } } 我们轮流写每一份文件，直到我们写完每一份文件。我们等待每个写操作完成，然后再开始下一个操作

使用这种方法写入1000个文档大约需要105秒，因此吞吐量大约为每秒写入10个文档

使用批处理写入操作这是最复杂的解决方案

异步函数testBatchedWrites（数据）{ 让batch=admin.firestore（）.batch（）；让计数=0； while（数据长度）{ batch.set（collection.doc（Math.random（）.toString（36）.substring（2,15）），datas.shift（））；如果（++计数>=500 | |！数据长度）{ 等待批处理。提交（）； batch=admin.firestore（）.batch（）；计数=0； } } } 您可以看到，我们通过调用

batch（）

，创建了一个

BatchedWrite

对象，将其填充到500个文档的最大容量，然后将其写入Firestore。我们为每个文档提供一个生成的名称，该名称相对来说可能是唯一的（对于这个测试来说已经足够好了）

使用这种方法编写1000个文档大约需要2.8秒，因此吞吐量大约为每秒357次文档写入

这比按顺序单独写入要快得多。事实上：许多开发人员使用这种方法是因为他们认为它是最快的，但是上面的结果已经表明这是不正确的。由于批量的大小限制，代码是迄今为止最复杂的

并行单个写操作 Firestore文档说明了以下内容：

对于大容量数据输入，请使用具有并行单个写入的服务器客户端库。批处理写入的性能优于序列化写入，但不优于并行写入

我们可以使用以下代码对其进行测试：

异步函数testParallelIndividualWrites（数据）{
等待承诺.all（datas.map（（data）=>collection.add（data））；
}

此代码以最快的速度启动

add

操作，然后使用

Promise.all（）

等待它们全部完成。通过这种方法，操作可以并行运行

使用这种方法编写1000个文档大约需要1.5秒，因此吞吐量大约为每秒667次文档写入

这种差异远不如前两种方法大，但仍然比批处理写入快1.8倍以上

请注意：

您可以在上找到此测试的完整代码
虽然测试是使用Node.js完成的，但是在AdminSDK支持的所有平台上，您可能会得到类似的结果
不过，不要使用客户端SDK执行批量插入，因为结果可能会非常不同，而且不太可预测
通常，实际性能取决于您的机器、internet连接的带宽和延迟以及许多其他因素。基于这些，您可能也会看到差异中的差异，尽管我希望顺序保持不变
如果您在自己的测试中有任何异常值，或者发现完全不同的结果，请在下面留下评论
批写入是原子的。因此，如果文档之间存在依赖关系，并且必须写入所有文档，或者不必写入任何文档，则应使用批处理写入

正如在对OP的评论中所指出的，我在云函数中为Firestore编写文档时有过相反的经历

TL；DR：向Firestore写入1200个文档时，并行单独写入比并行批写入慢5倍以上。

我能想到的唯一解释是，谷歌云功能和Firestore之间出现了某种瓶颈或请求速率限制。这有点神秘

以下是我测试的两种方法的代码：

const functions = require('firebase-functions');
const admin = require('firebase-admin');


admin.initializeApp();
const db = admin.firestore();


// Parallel Batch Writes
exports.cloneAppBatch = functions.https.onCall((data, context) => {

    return new Promise((resolve, reject) => {

        let fromAppKey = data.appKey;
        let toAppKey = db.collection('/app').doc().id;


        // Clone/copy data from one app subcollection to another
        let startTimeMs = Date.now();
        let docs = 0;

        // Write the app document (and ensure cold start doesn't affect timings below)
        db.collection('/app').doc(toAppKey).set({ desc: 'New App' }).then(() => {

            // Log Benchmark
            functions.logger.info(`[BATCH] 'Write App Config Doc' took ${Date.now() - startTimeMs}ms`);


            // Get all documents in app subcollection
            startTimeMs = Date.now();

            return db.collection(`/app/${fromAppKey}/data`).get();

        }).then(appDataQS => {

            // Log Benchmark
            functions.logger.info(`[BATCH] 'Read App Data' took ${Date.now() - startTimeMs}ms`);


            // Batch up documents and write to new app subcollection
            startTimeMs = Date.now();

            let commits = [];
            let bDocCtr = 0;
            let batch = db.batch();

            appDataQS.forEach(docSnap => {

                let doc = docSnap.data();
                let docKey = docSnap.id;
                docs++;

                let docRef = db.collection(`/app/${toAppKey}/data`).doc(docKey);

                batch.set(docRef, doc);
                bDocCtr++

                if (bDocCtr >= 500) {
                    commits.push(batch.commit());
                    batch = db.batch();
                    bDocCtr = 0;
                }

            });

            if (bDocCtr > 0) commits.push(batch.commit());

            Promise.all(commits).then(results => {
                // Log Benchmark
                functions.logger.info(`[BATCH] 'Write App Data - ${docs} docs / ${commits.length} batches' took ${Date.now() - startTimeMs}ms`);
                resolve(results);
            });
         
        }).catch(err => {
            reject(err);
        });

    });

});


// Parallel Individual Writes
exports.cloneAppNoBatch = functions.https.onCall((data, context) => {

    return new Promise((resolve, reject) => {

        let fromAppKey = data.appKey;
        let toAppKey = db.collection('/app').doc().id;


        // Clone/copy data from one app subcollection to another
        let startTimeMs = Date.now();
        let docs = 0;

        // Write the app document (and ensure cold start doesn't affect timings below)
        db.collection('/app').doc(toAppKey).set({ desc: 'New App' }).then(() => {

            // Log Benchmark
            functions.logger.info(`[INDIVIDUAL] 'Write App Config Doc' took ${Date.now() - startTimeMs}ms`);


            // Get all documents in app subcollection
            startTimeMs = Date.now();

            return db.collection(`/app/${fromAppKey}/data`).get();

        }).then(appDataQS => {

            // Log Benchmark
            functions.logger.info(`[INDIVIDUAL] 'Read App Data' took ${Date.now() - startTimeMs}ms`);


            // Gather up documents and write to new app subcollection
            startTimeMs = Date.now();

            let commits = [];

            appDataQS.forEach(docSnap => {

                let doc = docSnap.data();
                let docKey = docSnap.id;
                docs++;
                    
                // Parallel individual writes
                commits.push(db.collection(`/app/${toAppKey}/data`).doc(docKey).set(doc));
        
            });

            Promise.all(commits).then(results => {
                // Log Benchmark
                functions.logger.info(`[INDIVIDUAL] 'Write App Data - ${docs} docs' took ${Date.now() - startTimeMs}ms`);
                resolve(results);
            });
         
        }).catch(err => {
            reject(err);
        });

    });

});

具体结果如下（平均每次运行3次）：

批写入：

读1200份文件-2.4秒/写1200份文件-1.8秒

个人写作：

读1200份文件-2.4秒/写1200份文件-10.5秒

注意：这些结果比我前几天得到的结果要好得多-也许谷歌今天过得不好-但批处理和单独写入之间的相对性能保持不变。如果其他人也有类似的经历，那就太好了。

这太有趣了，谢谢你的工作！OOC，您是否测试过并行运行批处理写入？显然，在这种情况下，您需要更加确保避免任何文档同时出现在两个批处理中。我正要测试并行批处理写入，但超出了配额（这是一个免费项目，我懒得升级）。今天是新的一天