Azure service fabric 将百万行加载到分区有状态服务中_Azure Service Fabric_Service Fabric Stateful

Azure service fabric 将百万行加载到分区有状态服务中

azure-service-fabric

Azure service fabric 将百万行加载到分区有状态服务中,azure-service-fabric,service-fabric-stateful,Azure Service Fabric,Service Fabric Stateful,我正在尝试将2000万行加载到分区有状态服务可靠字典中。我将有状态服务划分为10个分区。基于MSDN文档，我了解到我需要使用一些散列算法来找到正确的分区，并将数据发送到它以加载到IReliabledictionary中。因此，我使用了基于该值的分区号。我存储的只是IReliableDictionary中的列表所以我创建了一个无状态服务作为包装器它将从SQL Server获取行（2000万）获取每行使用的分区号按分区号将它们分组使用ServiceRemoting为每个分区调用有状态服务。

我正在尝试将2000万行加载到分区有状态服务可靠字典中。我将有状态服务划分为10个分区。基于MSDN文档，我了解到我需要使用一些散列算法来找到正确的分区，并将数据发送到它以加载到IReliabledictionary中。因此，我使用了基于该值的分区号。我存储的只是

IReliableDictionary

中的

列表
所以我创建了一个无状态服务作为包装器
它将从SQL Server获取行（2000万）
获取每行使用的分区号
按分区号将它们分组
使用ServiceRemoting为每个分区调用有状态服务。然而，如果我每个请求发送100万行数据，我会得到fabric消息太大的异常，所以我将每个请求分块为100000行
这需要74分钟才能完成。这太长了。下面是上传的代码-
请告知
 foreach (var itemKvp in ItemsDictionary)
            {
                var ulnv2Uri = new Uri("fabric:/TestApp/dataservice");


                //Insert to the correct shard based on the hash algorithm 
                var dataService = _serviceProxyFactory.CreateServiceProxy<IDataService>(
                dataStoreUri,
                 new ServicePartitionKey(itemKvp.Key), TargetReplicaSelector.PrimaryReplica, "dataServiceRemotingListener");

                var itemsShard = itemKvp.Value;
                //if the total records count is greater then 100000 then send it in chunks
                if (itemsShard.Count > 1_000_000)
                {
                    //var tasks = new List<Task>();
                    var totalCount = itemsShard.Count;
                    var pageSize = 100000;
                    var page = 1;
                    var skip = 0;
                    while (skip < totalCount)
                    {
                        await dataService.InsertData(itemsShard.Skip(skip).Take(pageSize).ToList());
                        page++;
                        skip = pageSize * (page - 1);
                    }

                }
                else
                {
                   //otherwise send all together
                    await dataService.InsertData(itemsShard);
                }

            }

foreach（ItemsDictionary中的var itemKvp）
{
var ulnv2Uri=新Uri（“结构：/TestApp/dataservice”）；
//根据哈希算法插入到正确的碎片
var dataService=\u serviceProxyFactory.CreateServiceProxy(
dataStoreUri，
新的ServicePartitionKey（itemKvp.Key）、TargetReplicaSelector.PrimaryReplica、“DataServiceMotingListener”）；
var itemsShard=itemKvp.Value；
//如果总记录数大于100000，则分块发送
如果（itemsShard.Count>1\u 000\u 000）
{
//var tasks=新列表（）；
var totalCount=itemshard.Count；
var pageSize=100000；
var-page=1；
var-skip=0；
while（跳过通过并行上传到所有分区，您可能可以在这里节省一些时间。
因此，创建10个服务代理（每个分区一个）并同时使用它们
 我在itemshards.count>1m条件下尝试了并行调用。但这最终导致超时错误。我将尝试分区级别的并行调用，并让您知道。添加并行调用后，它现在减少到30分钟。这是在SF Reliabledictionary中上载数据的常见时间吗？您的IReliableDictionary是什么样子的？当你说你在IReliableDictionary中存储一个列表时，你的意思是它是IReliableDictionary还是你在按照IReliableDictionary的思路存储一些东西？