Node.js 批量写入集合会导致超时错误,但在写入空测试集合时有效
tl/dr强> 使用bulkWrite对Azure CosmosDB和MongoAPI执行批替换操作,upsert:true。在空测试集合上正常工作,但在预期集合上导致超时。计划收集的数据已经有约7300条。为什么我有时间去收集一个而不是另一个?我如何避免超时而不使用非常小的批处理大小,这会增加执行时间 背景 我有一个Azure函数,用于从API检索数据并将数据保存在数据库集合中。我使用带有MongoAPI适配器的Azure CosmosDB作为我的数据库 该函数从API检索数据,并创建一个用于bulkWrite的操作数组。这些操作都是replaceOne类型,upsert设置为true,并且为upsert功能提供了一个过滤器。操作被分为500个组进行批处理,并传递给bulkWrite以写入数据库。每批在上一次批量写入完成后发送 问题 在实现此功能时,我使用了一个新的空集合来测试功能。使用此测试集合时,功能按预期工作。但是,在切换到我打算在生产中使用的集合后,我在尝试使用bulkWrite时出现“请求超时”错误: 错误Node.js 批量写入集合会导致超时错误,但在写入空测试集合时有效,node.js,mongodb,mongodb-query,azure-cosmosdb,azure-cosmosdb-mongoapi,Node.js,Mongodb,Mongodb Query,Azure Cosmosdb,Azure Cosmosdb Mongoapi,tl/dr 使用bulkWrite对Azure CosmosDB和MongoAPI执行批替换操作,upsert:true。在空测试集合上正常工作,但在预期集合上导致超时。计划收集的数据已经有约7300条。为什么我有时间去收集一个而不是另一个?我如何避免超时而不使用非常小的批处理大小,这会增加执行时间 背景 我有一个Azure函数,用于从API检索数据并将数据保存在数据库集合中。我使用带有MongoAPI适配器的Azure CosmosDB作为我的数据库 该函数从API检索数据,并创建一个用于bu
[2020-12-09T18:29:41.761] (node:18988) UnhandledPromiseRejectionWarning: BulkWriteError: Request timed out.
[2020-12-09T18:29:41.764] at OrderedBulkOperation.handleWriteError (C:\Users\Liam\Desktop\Work\myApp\node_modules\mongodb\lib\bulk\common.js:1257:9)
[2020-12-09T18:29:41.767] at resultHandler (C:\Users\Liam\Desktop\Work\myApp\api\node_modules\mongodb\lib\bulk\common.js:521:23)
[2020-12-09T18:29:41.768] at handler (C:\Users\Liam\Desktop\Work\myApp\api\node_modules\mongodb\lib\core\sdam\topology.js:942:24)
[2020-12-09T18:29:41.769] at C:\Users\Liam\Desktop\Work\myApp\api\node_modules\mongodb\lib\cmap\connection_pool.js:356:13
[2020-12-09T18:29:41.770] at handleOperationResult (C:\Users\Liam\Desktop\Work\myApp\api\node_modules\mongodb\lib\core\sdam\server.js:558:5)
[2020-12-09T18:29:41.771] at MessageStream.messageHandler (C:\Users\Liam\Desktop\Work\myApp\api\node_modules\mongodb\lib\cmap\connection.js:275:5)
[2020-12-09T18:29:41.772] at MessageStream.emit (events.js:315:20)
[2020-12-09T18:29:41.772] at MessageStream.EventEmitter.emit (domain.js:482:12)
[2020-12-09T18:29:41.773] at processIncomingData (C:\Users\Liam\Desktop\Work\myApp\api\node_modules\mongodb\lib\cmap\message_stream.js:144:12)
[2020-12-09T18:29:41.774] at MessageStream._write (C:\Users\Liam\Desktop\Work\myApp\api\node_modules\mongodb\lib\cmap\message_stream.js:42:5)
[2020-12-09T18:29:41.775] at doWrite (_stream_writable.js:403:12)
[2020-12-09T18:29:41.775] at writeOrBuffer (_stream_writable.js:387:5)
[2020-12-09T18:29:41.814] at MessageStream.Writable.write (_stream_writable.js:318:11)
[2020-12-09T18:29:41.856] at TLSSocket.ondata (_stream_readable.js:717:22)
[2020-12-09T18:29:41.857] at TLSSocket.emit (events.js:315:20)
[2020-12-09T18:29:41.857] at TLSSocket.EventEmitter.emit (domain.js:482:12)
问题
当我可以毫无问题地上传到另一个集合时,是什么导致了这个超时错误
在这个例子中,我上传了大约650条数据。现有集合已经有7300条数据,而测试集合有0条数据
代码
Azure函数体(使用Azure Function express)
upsertData函数
upsertData = async (data, collection) => {
const bulkReplaceOps = data.map(item => {
// Define filters (Unique identifiers for each object record)
const filters = {
YahooNFLPlayer: {
objectType: "YahooNFLPlayer",
league_key: item.league_key,
player_key: item.player_key,
guid: item.guid
},
YahooNFLStanding: {
objectType: "YahooNFLStanding",
team_key: item.team_key,
guid: item.guid
},
YahooNFLDraftResult: {
objectType: "YahooNFLDraftResult",
league_key: item.league_key,
pick: item.pick,
round: item.round,
guid: item.guid
},
YahooNFLTransaction: {
objectType: "YahooNFLTransaction",
transaction_key: item.transaction_key,
guid: item.guid
},
YahooNFLTeamRoster: {
objectType: "YahooNFLTeamRoster",
league_key: item.league_key,
team_key: item.team_key,
player_key: item.player_key,
guid: item.guid
},
YahooNFLTeam: {
objectType: "YahooNFLTeam",
guid: item.guid,
team_key: item.team_key
}
}
// Map data to array of replace operations
return {
replaceOne: {
filter: filters[item.objectType], // Select filter based on type of data
replacement: item, // Data to be uploaded
upsert: true // Create new doc if not existing, replace otherwise
}
}
});
// Batch in groups of 500 (Best practice)
while (bulkReplaceOps.length) {
try {
await collection.bulkWrite(bulkReplaceOps.splice(0, 500));
console.log("YahooUploadUserSelectedData: Successful bulk upsert");
} catch (e) {
throw e;
}
}
}
我尝试过的
- 减少批量
- 通过减少批量大小,我们可以避免超时。但是,每个批处理都需要很长时间才能完成,从而增加了执行时间。这是不好的,因为它可能超过Azure功能的5分钟运行时间,并妨碍用户体验
- 降低过滤器的复杂性
- 我担心upsert过滤器中的字段数(对于某些数据,最多为5个)太高,可能会在查询的运行时产生一些开销。作为测试,我在每个过滤器中只使用了一个字段。我仍然遇到了超时
- 添加附加数据后对测试集合的测试查询
- 我想可能是因为我想在生产中使用的容器中已经有一些数据(7300个),而测试集合中没有初始数据。我的想法是,这增加了运行upsert查询所需的时间,因为它必须过滤现有数据以找到正确的文档。作为测试,我使用studio3t将数据从DataSources集合复制到测试集合中。我仍然能够将数据批量写入测试集合,而不会遇到超时问题
upsertData = async (data, collection) => {
const bulkReplaceOps = data.map(item => {
// Define filters (Unique identifiers for each object record)
const filters = {
YahooNFLPlayer: {
objectType: "YahooNFLPlayer",
league_key: item.league_key,
player_key: item.player_key,
guid: item.guid
},
YahooNFLStanding: {
objectType: "YahooNFLStanding",
team_key: item.team_key,
guid: item.guid
},
YahooNFLDraftResult: {
objectType: "YahooNFLDraftResult",
league_key: item.league_key,
pick: item.pick,
round: item.round,
guid: item.guid
},
YahooNFLTransaction: {
objectType: "YahooNFLTransaction",
transaction_key: item.transaction_key,
guid: item.guid
},
YahooNFLTeamRoster: {
objectType: "YahooNFLTeamRoster",
league_key: item.league_key,
team_key: item.team_key,
player_key: item.player_key,
guid: item.guid
},
YahooNFLTeam: {
objectType: "YahooNFLTeam",
guid: item.guid,
team_key: item.team_key
}
}
// Map data to array of replace operations
return {
replaceOne: {
filter: filters[item.objectType], // Select filter based on type of data
replacement: item, // Data to be uploaded
upsert: true // Create new doc if not existing, replace otherwise
}
}
});
// Batch in groups of 500 (Best practice)
while (bulkReplaceOps.length) {
try {
await collection.bulkWrite(bulkReplaceOps.splice(0, 500));
console.log("YahooUploadUserSelectedData: Successful bulk upsert");
} catch (e) {
throw e;
}
}
}