Java 为什么在GAE数据存储中计算X个实体的时间会随着实体总数的增加而增加?

Java 为什么在GAE数据存储中计算X个实体的时间会随着实体总数的增加而增加?,java,google-app-engine,google-cloud-datastore,objectify,Java,Google App Engine,Google Cloud Datastore,Objectify,我运行了一些测试来计算GoogleAppEngine数据存储中X类实体的数量,数量限制为5000。令我惊讶的是,随着数据存储中X类实体总数的增加,此操作所花费的时间也会增加 如果计数操作只是遍历实体键上的索引,那么时间不应该是常数(只要总计数>5000),而与数据存储中X类实体的总数无关吗 [NB:这不是关于是否使用分片计数器或使用数据存储统计数据,而是关于我的测试结果是否违反直觉。] 更新1:在devserver上测试 以下是一些数据: Time to create & save 10

我运行了一些测试来计算GoogleAppEngine数据存储中X类实体的数量,数量限制为5000。令我惊讶的是,随着数据存储中X类实体总数的增加,此操作所花费的时间也会增加

如果计数操作只是遍历实体键上的索引,那么时间不应该是常数(只要总计数>5000),而与数据存储中X类实体的总数无关吗

[NB:这不是关于是否使用分片计数器或使用数据存储统计数据,而是关于我的测试结果是否违反直觉。]

更新1:
devserver
上测试

以下是一些数据:

Time to create & save 100000 entities: 35.92 s
Using Objectify:
Individual times of 10 runs: 14795, 9521, 9300, 9117, 9848, 9391, 8378, 8525, 8593, 8706
Average time to count 5000 entities over 10 runs: 9.617 seconds
--------------------------------------------------------------------------------
Using Datastore:
Individual times of 10 runs: 8984, 8827, 9062, 9160, 8768, 8737, 8488, 8523, 8828, 8956
Average time to count 5000 entities over 10 runs: 8.833 seconds
--------------------------------------------------------------------------------

Time to create & save 50000 entities: 20.03 s
Using Objectify:
Individual times of 10 runs: 5877, 4736, 4162, 4252, 4126, 4203, 4153, 4168, 4051, 4110
Average time to count 5000 entities over 10 runs: 4.384 seconds
--------------------------------------------------------------------------------
Using Datastore:
Dec 16, 2015 10:00:36 AM in.co.amebatechnologies.empireapp.test.DatastoreTests tearDown
INFO: Closing this session
Individual times of 10 runs: 4409, 4380, 4577, 4414, 4121, 4050, 4076, 4050, 4089, 4148
Average time to count 5000 entities over 10 runs: 4.231 seconds
--------------------------------------------------------------------------------

Time to create & save 10000 entities: 8.989 s
Using Objectify:
Individual times of 10 runs: 1893, 802, 713, 678, 679, 657, 648, 654, 659, 654
Average time to count 5000 entities over 10 runs: 0.804 seconds
--------------------------------------------------------------------------------
Using Datastore:
Individual times of 10 runs: 923, 789, 871, 680, 677, 694, 680, 682, 728, 682
Average time to count 5000 entities over 10 runs: 0.741 seconds
--------------------------------------------------------------------------------
使用:

  • GAE SDK 1.9.30
  • 目标化5.1.7
直接计算数据存储中实体的代码(即,不使用Objectify):


开发服务器提供生产环境的本地模拟,包括数据存储(文档化)。但是,对于大型数据集,数据存储仿真没有相同的性能特征

当我运行代码来计算生产数据存储环境中的实体数(从10000到100000个实体)时,10次运行的平均时间是一致的:

Total Entities: 20000
Run 0: Time to count 5000 entities: 242
Run 1: Time to count 5000 entities: 352
Run 2: Time to count 5000 entities: 215
Run 3: Time to count 5000 entities: 244
Run 4: Time to count 5000 entities: 241
Run 5: Time to count 5000 entities: 221
Run 6: Time to count 5000 entities: 258
Run 7: Time to count 5000 entities: 219
Run 8: Time to count 5000 entities: 260
Run 9: Time to count 5000 entities: 219
Average: 247.1

Total Entities: 50000
Run 0: Time to count 5000 entities: 346
Run 1: Time to count 5000 entities: 236
Run 2: Time to count 5000 entities: 214
Run 3: Time to count 5000 entities: 353
Run 4: Time to count 5000 entities: 244
Run 5: Time to count 5000 entities: 229
Run 6: Time to count 5000 entities: 244
Run 7: Time to count 5000 entities: 257
Run 8: Time to count 5000 entities: 216
Run 9: Time to count 5000 entities: 224
Average: 256.3

Total Entities: 100000
Run 0: Time to count 5000 entities: 215
Run 1: Time to count 5000 entities: 212
Run 2: Time to count 5000 entities: 329
Run 3: Time to count 5000 entities: 217
Run 4: Time to count 5000 entities: 230
Run 5: Time to count 5000 entities: 231
Run 6: Time to count 5000 entities: 225
Run 7: Time to count 5000 entities: 222
Run 8: Time to count 5000 entities: 273
Run 9: Time to count 5000 entities: 306
Average: 246.0

这个问题其实不是关于编程,而是关于谷歌基础设施为什么会这样工作。我建议你仔细阅读谷歌发布的一些架构文档。对于数据存储,计算更多的记录需要更多的工作,因此增加了时间。不知道为什么您会认为这是违反直觉的。我要求数据存储提供实体计数,但最多不超过5000(即,“计数器达到5000时停止”)。这样做的时间不应随记录总数的变化而变化。您是在开发服务器上还是在生产环境中进行测试?在开发服务器上进行测试。我认为开发服务器无法提供此类测试的相关结果
Total Entities: 20000
Run 0: Time to count 5000 entities: 242
Run 1: Time to count 5000 entities: 352
Run 2: Time to count 5000 entities: 215
Run 3: Time to count 5000 entities: 244
Run 4: Time to count 5000 entities: 241
Run 5: Time to count 5000 entities: 221
Run 6: Time to count 5000 entities: 258
Run 7: Time to count 5000 entities: 219
Run 8: Time to count 5000 entities: 260
Run 9: Time to count 5000 entities: 219
Average: 247.1

Total Entities: 50000
Run 0: Time to count 5000 entities: 346
Run 1: Time to count 5000 entities: 236
Run 2: Time to count 5000 entities: 214
Run 3: Time to count 5000 entities: 353
Run 4: Time to count 5000 entities: 244
Run 5: Time to count 5000 entities: 229
Run 6: Time to count 5000 entities: 244
Run 7: Time to count 5000 entities: 257
Run 8: Time to count 5000 entities: 216
Run 9: Time to count 5000 entities: 224
Average: 256.3

Total Entities: 100000
Run 0: Time to count 5000 entities: 215
Run 1: Time to count 5000 entities: 212
Run 2: Time to count 5000 entities: 329
Run 3: Time to count 5000 entities: 217
Run 4: Time to count 5000 entities: 230
Run 5: Time to count 5000 entities: 231
Run 6: Time to count 5000 entities: 225
Run 7: Time to count 5000 entities: 222
Run 8: Time to count 5000 entities: 273
Run 9: Time to count 5000 entities: 306
Average: 246.0