具有SQL Server数据库调用的多线程C#应用程序_C#_Sql_Sql Server_Multithreading_Architecture

具有SQL Server数据库调用的多线程C#应用程序

c# sql sql-server multithreading architecture

具有SQL Server数据库调用的多线程C#应用程序,c#,sql,sql-server,multithreading,architecture,C#,Sql,Sql Server,Multithreading,Architecture,我有一个SQL Server数据库，表main中有500000条记录。还有另外三个表，分别称为child1、child2和child3。child1、child2、child3和main之间的多对多关系通过三个关系表实现：main\u child1\u关系、main\u child2\u关系，以及main\u child3\u关系。我需要读取main中的记录，更新main，还需要在关系表中插入新行以及在子表中插入新记录。子表中的记录具有唯一性约束，因此实际计算（CalculateDetails）

我有一个SQL Server数据库，表

main

中有500000条记录。还有另外三个表，分别称为

child1

、

child2

和

child3

。

child1

、

child2

、

child3

和

main

之间的多对多关系通过三个关系表实现：

main\u child1\u关系

、

main\u child2\u关系

，以及

main\u child3\u关系

。我需要读取

main

中的记录，更新

main

，还需要在关系表中插入新行以及在子表中插入新记录。子表中的记录具有唯一性约束，因此实际计算（CalculateDetails）的伪代码类似于：

for each record in main
{
   find its child1 like qualities
   for each one of its child1 qualities
   {
      find the record in child1 that matches that quality
      if found
      {
          add a record to main_child1_relationship to connect the two records
      }
      else
      {
          create a new record in child1 for the quality mentioned
          add a record to main_child1_relationship to connect the two records
      }
   }
   ...repeat the above for child2
   ...repeat the above for child3 
}

这可以作为一个单线程应用程序使用。但是它太慢了。C#中的处理任务非常繁重，耗时太长。我想把它变成一个多线程的应用程序

最好的方法是什么？我们正在使用LINQtoSQL

到目前为止，我的方法是为

main

中的每批记录创建一个新的

DataContext

对象，并使用

ThreadPool.QueueUserWorkItem

来处理它。然而，这些批处理彼此都在步履蹒跚，因为一个线程添加一条记录，然后下一个线程尝试添加同一条记录，然后。。。我得到了各种有趣的SQL Server死锁

代码如下：

    int skip = 0;
    List<int> thisBatch;
    Queue<List<int>> allBatches = new Queue<List<int>>();
    do
    {
        thisBatch = allIds
                .Skip(skip)
                .Take(numberOfRecordsToPullFromDBAtATime).ToList();
        allBatches.Enqueue(thisBatch);
        skip += numberOfRecordsToPullFromDBAtATime;

    } while (thisBatch.Count() > 0);

    while (allBatches.Count() > 0)
    {
        RRDataContext rrdc = new RRDataContext();

        var currentBatch = allBatches.Dequeue();
        lock (locker)  
        {
            runningTasks++;
        }
        System.Threading.ThreadPool.QueueUserWorkItem(x =>
                    ProcessBatch(currentBatch, rrdc));

        lock (locker) 
        {
            while (runningTasks > MAX_NUMBER_OF_THREADS)
            {
                 Monitor.Wait(locker);
                 UpdateGUI();
            }
        }
    }

int skip=0；
列出此批次；
Queue ALLBACKS=新队列（）；
做
{
thisBatch=allIds
.Skip（Skip）
.Take（numberofrecordstopullfromdbatime.ToList（）；
AllBatchs.Enqueue（此批次）；
skip+=RecordstopullFromDbatime的数量；
}while（thisBatch.Count（）>0）；
while（allBatches.Count（）>0）
{
RRDataContext rrdc=新的RRDataContext（）；
var currentBatch=allBatches.Dequeue（）；
锁（储物柜）
{
runningTasks++；
}
System.Threading.ThreadPool.QueueUserWorkItem（x=>
ProcessBatch（currentBatch，rrdc））；
锁（储物柜）
{
while（运行任务>最大线程数）
{
监视器。等待（储物柜）；
UpdateGUI（）；
}
}
}

下面是ProcessBatch：

    private static void ProcessBatch( 
        List<int> currentBatch, RRDataContext rrdc)
    {
        var topRecords = GetTopRecords(rrdc, currentBatch);
        CalculateDetails(rrdc, topRecords);
        rrdc.Dispose();

        lock (locker)
        {
            runningTasks--;
            Monitor.Pulse(locker);
        };
    }

private static void ProcessBatch（
列表currentBatch，RRDataContext（rrdc）
{
var topRecords=GetTopRecords（rrdc，currentBatch）；
计算的详细信息（rrdc、topRecords）；
rrdc.Dispose（）；
锁（储物柜）
{
运行任务--；
监视器。脉冲（锁定器）；
};
}

及

私有静态列表GetToRecords（RecipeRelationshipsDataContext rrdc，
列出此批次）
{
列出最佳记录；
topRecords=rrdc.Records
.Where（x=>thisBatch.Contains（x.Id））
.OrderBy（x=>x.OrderByMe.ToList（）；
归还记录；
}

CalculateDetails

最好用顶部的伪代码来解释

我想一定有更好的办法。请帮忙。非常感谢

概述问题的根源在于L2S DataContext与实体框架的ObjectContext一样，不是线程安全的。如中所述，.NET ORM解决方案中对异步操作的支持在.NET 4.0中仍处于挂起状态；您必须推出自己的解决方案，正如您所发现的，当您的框架假设为单线程时，这并不总是容易做到的

我将借此机会注意到L2S是建立在ADO.NET之上的，ADO.NET本身完全支持异步操作——就我个人而言，我更愿意直接处理底层并自己编写SQL，只是为了确保我完全理解网络上发生的事情

SQL Server解决方案？话虽如此，我不得不问——这是否是一个C#解决方案？如果您可以用一组insert/update语句组成您的解决方案，您只需直接通过SQL发送，线程和性能问题就会消失。*在我看来，您的问题与要进行的实际数据转换无关，而是围绕着从.NET执行这些转换。如果将.NET从等式中删除，您的任务将变得更简单。毕竟，最好的解决方案通常是让您编写最少的代码，对吗？；）

即使您的更新/插入逻辑不能以严格设置的关系方式表示，SQL Server也有一个内置的机制来迭代记录和执行逻辑——尽管它们在许多用例中都被恶意攻击，但游标实际上可能适合您的任务

如果这是一项必须重复执行的任务，那么将其编码为存储过程将使您受益匪浅

*当然，长时间运行的SQL也带来了自己的问题，比如锁升级和索引使用，您必须应对这些问题

C#解决方案当然，在SQL中这样做可能是不可能的——例如，您的代码的决定可能取决于来自其他地方的数据，或者您的项目有严格的“不允许使用SQL”约定。您提到了一些典型的多线程错误，但是没有看到您的代码，我无法具体地帮助您解决这些问题

从C#执行此操作显然是可行的，但您需要处理这样一个事实，即您所做的每一次呼叫都会存在固定的延迟。通过使用池连接、启用多个活动结果集以及使用异步开始/结束方法执行查询，可以减轻网络延迟的影响。即使有了所有这些，您仍然必须接受将数据从SQL Server传送到服务器是有成本的

    private static List<Record> GetTopRecords(RecipeRelationshipsDataContext rrdc, 
                                              List<int> thisBatch)
    {
        List<Record> topRecords;

        topRecords = rrdc.Records
                    .Where(x => thisBatch.Contains(x.Id))
                    .OrderBy(x => x.OrderByMe).ToList();
        return topRecords;
    }

private IList<int> GetMainIds()
{
    using (var context = new MyDataContext())
        return context.Main.Select(m => m.Id).ToList();
}

private void FixUpSingleRecord(int mainRecordId)
{
    using (var localContext = new MyDataContext())
    {
        var main = localContext.Main.FirstOrDefault(m => m.Id == mainRecordId);

        if (main == null)
            return;

        foreach (var childOneQuality in main.ChildOneQualities)
        {
            // If child one is not found, create it
            // Create the relationship if needed
        }

        // Repeat for ChildTwo and ChildThree

        localContext.SaveChanges();
    }
}

public void FixUpMain()
{
    var ids = GetMainIds();
    foreach (var id in ids)
    {
        var localId = id; // Avoid closing over an iteration member
        ThreadPool.QueueUserWorkItem(delegate { FixUpSingleRecord(id) });
    }
}

BEGIN TRAN
DECLARE @mutex_result int;
EXEC @mutex_result = sp_getapplock @Resource = 'CheckSetFileTransferLock',
 @LockMode = 'Exclusive';

IF ( @mutex_result < 0)
BEGIN
    ROLLBACK TRAN

END

-- do some stuff

EXEC @mutex_result = sp_releaseapplock @Resource = 'CheckSetFileTransferLock'
COMMIT TRAN

using (var dc = new TestDataContext())
{
    // Get all the ids of interest.
    // I assume you mark successfully updated rows in some way
    // in the update transaction.
    List<int> ids = dc.TestItems.Where(...).Select(item => item.Id).ToList();

    var problematicIds = new List<ErrorType>();

    // Either allow the TaskParallel library to select what it considers
    // as the optimum degree of parallelism by omitting the 
    // ParallelOptions parameter, or specify what you want.
    Parallel.ForEach(ids, new ParallelOptions {MaxDegreeOfParallelism = 8},
                        id => CalculateDetails(id, problematicIds));
}

private static void CalculateDetails(int id, List<ErrorType> problematicIds)
{
    try
    {
        // Handle deadlocks
        DeadlockRetryHelper.Execute(() => CalculateDetails(id));
    }
    catch (Exception e)
    {
        // Too many deadlock retries (or other exception). 
        // Record so we can diagnose problem or retry later
        problematicIds.Add(new ErrorType(id, e));
    }
}

private static void CalculateDetails(int id)
{
    // Creating a new DeviceContext is not expensive.
    // No need to create outside of this method.
    using (var dc = new TestDataContext())
    {
        // TODO: adjust IsolationLevel to minimize deadlocks
        // If you don't need to change the isolation level 
        // then you can remove the TransactionScope altogether
        using (var scope = new TransactionScope(
            TransactionScopeOption.Required,
            new TransactionOptions {IsolationLevel = IsolationLevel.Serializable}))
        {
            TestItem item = dc.TestItems.Single(i => i.Id == id);

            // work done here

            dc.SubmitChanges();
            scope.Complete();
        }
    }
}

public static class DeadlockRetryHelper
{
    private const int MaxRetries = 4;
    private const int SqlDeadlock = 1205;

    public static void Execute(Action action, int maxRetries = MaxRetries)
    {
        if (HasAmbientTransaction())
        {
            // Deadlock blows out containing transaction
            // so no point retrying if already in tx.
            action();
        }

        int retries = 0;

        while (retries < maxRetries)
        {
            try
            {
                action();
                return;
            }
            catch (Exception e)
            {
                if (IsSqlDeadlock(e))
                {
                    retries++;
                    // Delay subsequent retries - not sure if this helps or not
                    Thread.Sleep(100 * retries);
                }
                else
                {
                    throw;
                }
            }
        }

        action();
    }

    private static bool HasAmbientTransaction()
    {
        return Transaction.Current != null;
    }

    private static bool IsSqlDeadlock(Exception exception)
    {
        if (exception == null)
        {
            return false;
        }

        var sqlException = exception as SqlException;

        if (sqlException != null && sqlException.Number == SqlDeadlock)
        {
            return true;
        }

        if (exception.InnerException != null)
        {
            return IsSqlDeadlock(exception.InnerException);
        }

        return false;
    }
}

CREATE TABLE closet (id int PRIMARY KEY, xmldoc ntext) 
CREATE TABLE shoe(id int PRIMARY KEY IDENTITY, color nvarchar(20))
CREATE TABLE closet_shoe_relationship (
    closet_id int REFERENCES closet(id),
    shoe_id int REFERENCES shoe(id)
)

INSERT INTO closet(id, xmldoc) VALUES (1, '<ROOT><shoe><color>blue</color></shoe></ROOT>')
INSERT INTO closet(id, xmldoc) VALUES (2, '<ROOT><shoe><color>red</color></shoe></ROOT>')

INSERT INTO shoe(color) SELECT DISTINCT CAST(CAST(xmldoc AS xml).query('//shoe/color/text()') AS nvarchar) AS color from closet
INSERT INTO closet_shoe_relationship(closet_id, shoe_id) SELECT closet.id, shoe.id FROM shoe JOIN closet ON CAST(CAST(closet.xmldoc AS xml).query('//shoe/color/text()') AS nvarchar) = shoe.color

INSERT INTO shoe(color)
    SELECT DISTINCT CAST(xmldoc.query('//shoe/color/text()') AS nvarchar)
    FROM closet
INSERT INTO closet_shoe_relationship(closet_id, shoe_id)
    SELECT closet.id, shoe.id
    FROM shoe JOIN closet
        ON CAST(xmldoc.query('//shoe/color/text()') AS nvarchar) = shoe.color