Node.js 批量写入集合会导致超时错误,但在写入空测试集合时有效

Node.js 批量写入集合会导致超时错误,但在写入空测试集合时有效,node.js,mongodb,mongodb-query,azure-cosmosdb,azure-cosmosdb-mongoapi,Node.js,Mongodb,Mongodb Query,Azure Cosmosdb,Azure Cosmosdb Mongoapi,tl/dr 使用bulkWrite对Azure CosmosDB和MongoAPI执行批替换操作,upsert:true。在空测试集合上正常工作,但在预期集合上导致超时。计划收集的数据已经有约7300条。为什么我有时间去收集一个而不是另一个?我如何避免超时而不使用非常小的批处理大小,这会增加执行时间 背景 我有一个Azure函数,用于从API检索数据并将数据保存在数据库集合中。我使用带有MongoAPI适配器的Azure CosmosDB作为我的数据库 该函数从API检索数据,并创建一个用于bu

tl/dr

使用bulkWrite对Azure CosmosDB和MongoAPI执行批替换操作,upsert:true。在空测试集合上正常工作,但在预期集合上导致超时。计划收集的数据已经有约7300条。为什么我有时间去收集一个而不是另一个?我如何避免超时而不使用非常小的批处理大小,这会增加执行时间

背景

我有一个Azure函数,用于从API检索数据并将数据保存在数据库集合中。我使用带有MongoAPI适配器的Azure CosmosDB作为我的数据库

该函数从API检索数据,并创建一个用于bulkWrite的操作数组。这些操作都是replaceOne类型,upsert设置为true,并且为upsert功能提供了一个过滤器。操作被分为500个组进行批处理,并传递给bulkWrite以写入数据库。每批在上一次批量写入完成后发送

问题

在实现此功能时,我使用了一个新的空集合来测试功能。使用此测试集合时,功能按预期工作。但是,在切换到我打算在生产中使用的集合后,我在尝试使用bulkWrite时出现“请求超时”错误:

错误

[2020-12-09T18:29:41.761] (node:18988) UnhandledPromiseRejectionWarning: BulkWriteError: Request timed out.
[2020-12-09T18:29:41.764]     at OrderedBulkOperation.handleWriteError (C:\Users\Liam\Desktop\Work\myApp\node_modules\mongodb\lib\bulk\common.js:1257:9)
[2020-12-09T18:29:41.767]     at resultHandler (C:\Users\Liam\Desktop\Work\myApp\api\node_modules\mongodb\lib\bulk\common.js:521:23)
[2020-12-09T18:29:41.768]     at handler (C:\Users\Liam\Desktop\Work\myApp\api\node_modules\mongodb\lib\core\sdam\topology.js:942:24)
[2020-12-09T18:29:41.769]     at C:\Users\Liam\Desktop\Work\myApp\api\node_modules\mongodb\lib\cmap\connection_pool.js:356:13
[2020-12-09T18:29:41.770]     at handleOperationResult (C:\Users\Liam\Desktop\Work\myApp\api\node_modules\mongodb\lib\core\sdam\server.js:558:5)
[2020-12-09T18:29:41.771]     at MessageStream.messageHandler (C:\Users\Liam\Desktop\Work\myApp\api\node_modules\mongodb\lib\cmap\connection.js:275:5)
[2020-12-09T18:29:41.772]     at MessageStream.emit (events.js:315:20)
[2020-12-09T18:29:41.772]     at MessageStream.EventEmitter.emit (domain.js:482:12)
[2020-12-09T18:29:41.773]     at processIncomingData (C:\Users\Liam\Desktop\Work\myApp\api\node_modules\mongodb\lib\cmap\message_stream.js:144:12)
[2020-12-09T18:29:41.774]     at MessageStream._write (C:\Users\Liam\Desktop\Work\myApp\api\node_modules\mongodb\lib\cmap\message_stream.js:42:5)
[2020-12-09T18:29:41.775]     at doWrite (_stream_writable.js:403:12)
[2020-12-09T18:29:41.775]     at writeOrBuffer (_stream_writable.js:387:5)
[2020-12-09T18:29:41.814]     at MessageStream.Writable.write (_stream_writable.js:318:11)
[2020-12-09T18:29:41.856]     at TLSSocket.ondata (_stream_readable.js:717:22)
[2020-12-09T18:29:41.857]     at TLSSocket.emit (events.js:315:20)
[2020-12-09T18:29:41.857]     at TLSSocket.EventEmitter.emit (domain.js:482:12)
问题

当我可以毫无问题地上传到另一个集合时,是什么导致了这个超时错误

在这个例子中,我上传了大约650条数据。现有集合已经有7300条数据,而测试集合有0条数据

代码

Azure函数体(使用Azure Function express)

upsertData函数

upsertData = async (data, collection) => {

  const bulkReplaceOps = data.map(item => {

    // Define filters (Unique identifiers for each object record)
    const filters = {
      YahooNFLPlayer: {
        objectType: "YahooNFLPlayer",
        league_key: item.league_key,
        player_key: item.player_key,
        guid: item.guid
      },
      YahooNFLStanding: {
        objectType: "YahooNFLStanding",
        team_key: item.team_key,
        guid: item.guid
      },
      YahooNFLDraftResult: {
        objectType: "YahooNFLDraftResult",
        league_key: item.league_key,
        pick: item.pick,
        round: item.round,
        guid: item.guid
      },
      YahooNFLTransaction: {
        objectType: "YahooNFLTransaction",
        transaction_key: item.transaction_key,
        guid: item.guid
      },
      YahooNFLTeamRoster: {
        objectType: "YahooNFLTeamRoster",
        league_key: item.league_key,
        team_key: item.team_key,
        player_key: item.player_key,
        guid: item.guid
      },
      YahooNFLTeam: {
        objectType: "YahooNFLTeam",
        guid: item.guid,
        team_key: item.team_key
      }
    }

    // Map data to array of replace operations
    return {
      replaceOne: {
        filter: filters[item.objectType], // Select filter based on type of data
        replacement: item,                // Data to be uploaded
        upsert: true                      // Create new doc if not existing, replace otherwise
      }
    }

  });

  // Batch in groups of 500 (Best practice)
  while (bulkReplaceOps.length) {
    try {
      await collection.bulkWrite(bulkReplaceOps.splice(0, 500));
      console.log("YahooUploadUserSelectedData: Successful bulk upsert");
    } catch (e) {
      throw e;
    }
  }

}
我尝试过的

  • 减少批量
    • 通过减少批量大小,我们可以避免超时。但是,每个批处理都需要很长时间才能完成,从而增加了执行时间。这是不好的,因为它可能超过Azure功能的5分钟运行时间,并妨碍用户体验
  • 降低过滤器的复杂性
    • 我担心upsert过滤器中的字段数(对于某些数据,最多为5个)太高,可能会在查询的运行时产生一些开销。作为测试,我在每个过滤器中只使用了一个字段。我仍然遇到了超时
  • 添加附加数据后对测试集合的测试查询
    • 我想可能是因为我想在生产中使用的容器中已经有一些数据(7300个),而测试集合中没有初始数据。我的想法是,这增加了运行upsert查询所需的时间,因为它必须过滤现有数据以找到正确的文档。作为测试,我使用studio3t将数据从DataSources集合复制到测试集合中。我仍然能够将数据批量写入测试集合,而不会遇到超时问题
非常感谢您对这个问题的任何帮助或见解

upsertData = async (data, collection) => {

  const bulkReplaceOps = data.map(item => {

    // Define filters (Unique identifiers for each object record)
    const filters = {
      YahooNFLPlayer: {
        objectType: "YahooNFLPlayer",
        league_key: item.league_key,
        player_key: item.player_key,
        guid: item.guid
      },
      YahooNFLStanding: {
        objectType: "YahooNFLStanding",
        team_key: item.team_key,
        guid: item.guid
      },
      YahooNFLDraftResult: {
        objectType: "YahooNFLDraftResult",
        league_key: item.league_key,
        pick: item.pick,
        round: item.round,
        guid: item.guid
      },
      YahooNFLTransaction: {
        objectType: "YahooNFLTransaction",
        transaction_key: item.transaction_key,
        guid: item.guid
      },
      YahooNFLTeamRoster: {
        objectType: "YahooNFLTeamRoster",
        league_key: item.league_key,
        team_key: item.team_key,
        player_key: item.player_key,
        guid: item.guid
      },
      YahooNFLTeam: {
        objectType: "YahooNFLTeam",
        guid: item.guid,
        team_key: item.team_key
      }
    }

    // Map data to array of replace operations
    return {
      replaceOne: {
        filter: filters[item.objectType], // Select filter based on type of data
        replacement: item,                // Data to be uploaded
        upsert: true                      // Create new doc if not existing, replace otherwise
      }
    }

  });

  // Batch in groups of 500 (Best practice)
  while (bulkReplaceOps.length) {
    try {
      await collection.bulkWrite(bulkReplaceOps.splice(0, 500));
      console.log("YahooUploadUserSelectedData: Successful bulk upsert");
    } catch (e) {
      throw e;
    }
  }

}