C# 在使用datareader读取数百万数据时，如何避免与数据库的连接丢失问题？_C#_Ado.net_Oracle Sqldeveloper

C# 在使用datareader读取数百万数据时，如何避免与数据库的连接丢失问题？

C# 在使用datareader读取数百万数据时，如何避免与数据库的连接丢失问题？,c#,ado.net,oracle-sqldeveloper,C#,Ado.net,Oracle Sqldeveloper,我有一个从数据库表中读取数据的类库。现在这个数据库表是客户端数据库，我的应用程序只有连接字符串和sql查询来打开连接、执行sql查询、读取数据和执行一些操作。这个操作是什么，有点复杂（基本上是业务规则）现在，用户以特定的格式提交sql查询，我的类库知道从sql查询结果中选择哪些列我不知道我的类库会处理多少条记录。它可能是100200或数百万数据目前，类库正在处理驻留在oracle上的9000万数据。我正在使用SQLDATAREADER读取这些数据现在的问题是为了避免内存异常我正在使用s

我有一个从数据库表中读取数据的类库。现在这个数据库表是客户端数据库，我的应用程序只有连接字符串和sql查询来打开连接、执行sql查询、读取数据和执行一些操作。这个操作是什么，有点复杂（基本上是业务规则）

现在，用户以特定的格式提交sql查询，我的类库知道从sql查询结果中选择哪些列

我不知道我的类库会处理多少条记录。它可能是100200或数百万数据

目前，类库正在处理驻留在oracle上的9000万数据。我正在使用

SQLDATAREADER

读取这些数据

现在的问题是为了避免内存异常我正在使用sql data reader读取数据，但1乘1读取9000万数据，然后对这些记录执行一些操作，此时连接将保持打开状态，目前我面临连接丢失的问题：

ORA-03135: connection lost contact

1解决方案可能是读取数据块，但正如我所说，我不知道我可以处理的记录数量，而且SQL查询不在我手中，因为它是由我的类库拾取的用户提交的

我能做些什么来避免连接问题吗

更新：

public class LongRunningTask : IDisposable
{
        public void Start(DbConnection connection, string sql)
        {
            using (var cmd = connection.CreateCommand())
            {
                cmd.CommandText = sql;
                cmd.CommandTimeout = 0;
                connection.Open();
                using (var dr = cmd.ExecuteReader(CommandBehavior.CloseConnection))
                {
                    //read 1 by 1 record and pass it to algorithm to do some complex processing
                }
            }
        }
}

算法并不慢，这也不是问题。主要问题是，如果当前有来自ORACLE的9000万数据，读取部分的速度会很慢

我在SQL SERVER上测试了1亿个数据，但我没有遇到这个问题（虽然有时会出现传输层错误），尽管这个过程花费了很多时间。我只在ORACLE上遇到这个问题。

您可以像这样设置连接超时限制：

command.CommandTimeout = 60; //The time in seconds to wait for the command to execute. The default is 30 seconds.

让数据读取器打开几个小时不是一个好主意。即使所有配置都正确，线路上的某个地方也可能存在瞬时错误（如您提到的传输层错误）

您可以在客户机代码中添加重试逻辑，使其更加健壮。一种方法是跟踪最后处理的记录，尝试重新连接，并在连接失败时从该位置“恢复”

private const int MAX_RETRY = 10;
private const int RETRY_INTERVAL_MS = 1000;
private string lastProcessedPosition = null;

public void Start(string connectionString, string sql)
{
    var exceptions = new List<Exception>();
    for (var i = 0; i < MAX_RETRY; i++)
    {
        try
        {
            if (Process(connString, sql, lastProcessedPosition)) return;
        }
        catch(Exception ex)
        {
            exceptions.Add(ex);
        }
        System.Threading.Thread.Sleep(RETRY_INTERVAL_MS);
    }
    throw new AggregateException(exceptions);
}

dr.ToPositionString（）

是一种扩展方法，您可以创建该方法，以根据表架构使行唯一。

简短回答：

我以前遇到过这种情况，这是因为我公司网络上的防火墙规则

长篇大论的回答和不请自来的建议：

我认为您的主要问题是应用程序设计。如果您要处理数百万条记录，可能需要很长时间…很长时间，这取决于您必须执行的操作。
我开发了一个应用程序来加密数据库中1亿个静止的卡号，花了3周时间才完成。处理真正的大数据很棘手；我遇到了各种各样的问题。以下是我的一些建议

1）您将听到您的问题在于超时设置。可能不是这样。在我工作的地方，我们有防火墙规则，会在一段时间后（我不记得是15或30分钟）终止数据库连接，我们花了数周时间才弄清楚为什么我们的连接会突然中断

2）一次收回数百万张唱片不是一个好主意

3）您应该在代码中加入一些SQL注入预防措施

4）我建议使用一个类似ORM的实体框架，这使得循环和分块更容易

您无法获取所有数据并将其保存到某个内存对象中，然后释放与数据库的连接；发布您复杂的子项规则，一旦完成，您需要将这些数据更新回数据库，再次打开连接并进行批量更新

希望我有点道理。

这个解决方案是我过去从数据库中读取大型数据集时使用的，但它是分块处理的：

首先，我选择实现一种获取数据库连接的方法。请注意，我将ConnectionTimeout设置为0，因为我知道此进程将长期运行

private static OracleConnection GetConnection()
{
    return new OracleConnection(new OracleConnectionStringBuilder
    {
        //TODO: Set other connection string properties
        ConnectionTimeout = 0
    }.ConnectionString);
}

public static IEnumerable<T> GetData<T>(string sql)
{
    using (var conn = GetConnection())
    {
        if (ConnectionState.Closed == conn.State) conn.Open();

        using (var cmd = conn.CreateCommand())
        {
            cmd.CommandTimeout = 0;
            cmd.CommandType = CommandType.Text;
            cmd.CommandText = sql; //TODO: Make sure you do standard sql injection prevention

            using (var reader = cmd.ExecuteReader())
            {
                //We want to optimize the number of round trips to the DB our reader makes.
                //Setting the FetchSize this way will make the reader bring back 5000 records
                //with every trip to the DB
                reader.FetchSize = reader.RowSize * 5000;

                while (reader.Read())
                {
                    var values = new object[reader.FieldCount];
                    reader.GetValues(values);
                    //This assumes that type T has a constructor that takes in an object[]
                    //and the mappings of object[] to properties is done in that constructor
                    yield return (T)Activator.CreateInstance(typeof(T), new object[] { values });
                }
            }
        }
    }
}

接下来，我想使用一些通用的“GetData”方法从数据库中读取数据。注意，它的返回类型是显式的“IEnumerable”。您可以强键入它，而不是让它成为泛型，但它需要保持返回IEnumerable，以便利用“收益率-回报率”

还要注意，我将CommandTimeout设置为0，因为我知道这个进程将长期运行

private static OracleConnection GetConnection()
{
    return new OracleConnection(new OracleConnectionStringBuilder
    {
        //TODO: Set other connection string properties
        ConnectionTimeout = 0
    }.ConnectionString);
}

public static IEnumerable<T> GetData<T>(string sql)
{
    using (var conn = GetConnection())
    {
        if (ConnectionState.Closed == conn.State) conn.Open();

        using (var cmd = conn.CreateCommand())
        {
            cmd.CommandTimeout = 0;
            cmd.CommandType = CommandType.Text;
            cmd.CommandText = sql; //TODO: Make sure you do standard sql injection prevention

            using (var reader = cmd.ExecuteReader())
            {
                //We want to optimize the number of round trips to the DB our reader makes.
                //Setting the FetchSize this way will make the reader bring back 5000 records
                //with every trip to the DB
                reader.FetchSize = reader.RowSize * 5000;

                while (reader.Read())
                {
                    var values = new object[reader.FieldCount];
                    reader.GetValues(values);
                    //This assumes that type T has a constructor that takes in an object[]
                    //and the mappings of object[] to properties is done in that constructor
                    yield return (T)Activator.CreateInstance(typeof(T), new object[] { values });
                }
            }
        }
    }
}

公共静态IEnumerable GetData（字符串sql）
{
使用（var conn=GetConnection（））
{
if（ConnectionState.Closed==conn.State）conn.Open（）；
使用（var cmd=conn.CreateCommand（））
{
cmd.CommandTimeout=0；
cmd.CommandType=CommandType.Text；
cmd.CommandText=sql；//TODO:确保执行标准的sql注入预防
使用（var reader=cmd.ExecuteReader（））
{
//我们希望优化到我们的阅读器的DB的往返次数。
//以这种方式设置FetchSize将使读取器返回5000条记录
//每次去DB
reader.FetchSize=reader.RowSize*5000；
while（reader.Read（））
{
var值=新对象[reader.FieldCount]；
reader.GetValues（值）；
//这假设类型T有一个接受对象[]的构造函数
//对象[]到属性的映射在该构造函数中完成
yield return（T）Activator.CreateInstance（typeof（T），新对象[]{values}）；
}
}
}
}
}

接下来我想有一些方法来实现这个奇特的总线

public static void Main(string[] args)
{
    foreach (var batch in GetData<string>("hello world").Batch(50000))
    {
        ProcessBusinessLogic(batch);
    }
}

using System;
using System.Collections.Generic;
using System.Data;
using MoreLinq;
using Oracle.ManagedDataAccess.Client;

namespace ReadLargeDataset
{
    public class Program
    {
        public static void Main(string[] args)
        {
            foreach (var batch in GetData<string>("hello world").Batch(50000))
            {
                ProcessBusinessLogic(batch);
            }
        }

        public static void ProcessBusinessLogic<T>(IEnumerable<T> data)
        {
            //TODO Implement fancy business logic here
        }

        public static IEnumerable<T> GetData<T>(string sql)
        {
            using (var conn = GetConnection())
            {
                if (ConnectionState.Closed == conn.State) conn.Open();

                using (var cmd = conn.CreateCommand())
                {
                    cmd.CommandTimeout = 0;
                    cmd.CommandType = CommandType.Text;
                    cmd.CommandText = sql; //TODO: Make sure you do standard sql injection prevention

                    using (var reader = cmd.ExecuteReader())
                    {
                        //We want to optimize the number of round trips to the DB our reader makes.
                        //Setting the FetchSize this way will make the reader bring back 5000 records
                        //with every trip to the DB
                        reader.FetchSize = reader.RowSize * 5000;

                        while (reader.Read())
                        {
                            var values = new object[reader.FieldCount];
                            reader.GetValues(values);
                            //This assumes that type T has a constructor that takes in an object[]
                            //and the mappings of object[] to properties is done in that constructor
                            yield return (T)Activator.CreateInstance(typeof(T), new object[] { values });
                        }
                    }
                }
            }
        }

        private static OracleConnection GetConnection()
        {
            return new OracleConnection(new OracleConnectionStringBuilder
            {
                //TODO: Set other connection string properties
                ConnectionTimeout = 0
            }.ConnectionString);
        }
    }
}

#region License and Terms
// MoreLINQ - Extensions to LINQ to Objects
// Copyright (c) 2009 Atif Aziz. All rights reserved.
// 
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
// 
//     http://www.apache.org/licenses/LICENSE-2.0
// 
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#endregion

// ReSharper disable CheckNamespace
namespace MoreLinq
{
    using System;
    using System.Collections.Generic;
    using System.Diagnostics;
    using System.Linq;

    public static class MoreEnumerable
    {
        /// <summary>
        /// Batches the source sequence into sized buckets.
        /// </summary>
        /// <typeparam name="TSource">Type of elements in <paramref name="source"/> sequence.</typeparam>
        /// <param name="source">The source sequence.</param>
        /// <param name="size">Size of buckets.</param>
        /// <returns>A sequence of equally sized buckets containing elements of the source collection.</returns>
        /// <remarks> This operator uses deferred execution and streams its results (buckets and bucket content).</remarks>

        public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(this IEnumerable<TSource> source, int size)
        {
            return Batch(source, size, x => x);
        }

        /// <summary>
        /// Batches the source sequence into sized buckets and applies a projection to each bucket.
        /// </summary>
        /// <typeparam name="TSource">Type of elements in <paramref name="source"/> sequence.</typeparam>
        /// <typeparam name="TResult">Type of result returned by <paramref name="resultSelector"/>.</typeparam>
        /// <param name="source">The source sequence.</param>
        /// <param name="size">Size of buckets.</param>
        /// <param name="resultSelector">The projection to apply to each bucket.</param>
        /// <returns>A sequence of projections on equally sized buckets containing elements of the source collection.</returns>
        /// <remarks> This operator uses deferred execution and streams its results (buckets and bucket content).</remarks>

        public static IEnumerable<TResult> Batch<TSource, TResult>(this IEnumerable<TSource> source, int size,
            Func<IEnumerable<TSource>, TResult> resultSelector)
        {
            if (source == null) throw new ArgumentNullException(nameof(source));
            if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size));
            if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
            return BatchImpl(source, size, resultSelector);
        }

        private static IEnumerable<TResult> BatchImpl<TSource, TResult>(this IEnumerable<TSource> source, int size,
            Func<IEnumerable<TSource>, TResult> resultSelector)
        {
            Debug.Assert(source != null);
            Debug.Assert(size > 0);
            Debug.Assert(resultSelector != null);

            TSource[] bucket = null;
            var count = 0;

            foreach (var item in source)
            {
                if (bucket == null)
                {
                    bucket = new TSource[size];
                }

                bucket[count++] = item;

                // The bucket is fully buffered before it's yielded
                if (count != size)
                {
                    continue;
                }

                // Select is necessary so bucket contents are streamed too
                yield return resultSelector(bucket.Select(x => x));

                bucket = null;
                count = 0;
            }

            // Return the last bucket with all remaining elements
            if (bucket != null && count > 0)
            {
                yield return resultSelector(bucket.Take(count));
            }
        }
    }
}

select 
  columns... 
from 
  data sources...
where 
  some conditions...
offset @offset
fetch first @pageSize rows

public class LongRunningTask
{
  const long pageSize = 100000L; //--> ...or whatever the market will bear
  const int retryLimit = 3;
  public void Start( ConnectionFactory factory, string sql )
  {
    var done = false;
    var page = 0L;
    var index = 0L;
    var retries = 0;
    var retrying = false;
    while ( !done )
    {
      try
      {
        using ( var connection = factory.CreateConnection( ) )
        {
          using ( var cmd = connection.CreateCommand( ) )
          {
            cmd.CommandType = CommandType.Text;
            cmd.CommandText = sql;
            cmd.Parameters.Add( factory.CreateParameter( "@pageSize", SqlDbType.BigInt ) );
            cmd.Parameters.Add( factory.CreateParameter( "@offset", SqlDbType.BigInt ) );
            cmd.Parameters[ "@pageSize" ].Value = pageSize - ( retrying ? index : 0 );
            cmd.Parameters[ "@offset" ].Value = page + ( retrying ? index : 0 );
            connection.Open( );
            using ( var dr = cmd.ExecuteReader( ) )
            {
              index = retrying ? index : 0;
              retrying = false;
              done = !dr.HasRows; //--> didn't get anything, we're done!
              while ( dr.Read( ) )
              {
                //read 1 by 1 record and pass it to algorithm to do some complex processing
                index++;
              }
            }
          }
        }
        page++;
      }
      catch ( Exception ex )
      {
        Console.WriteLine( ex );
        if ( retryLimit < retries++ ) throw;
        retrying = true;
      }
    }
  }
}

public  class ConnectionFactory
{
  public DbConnection CreateConnection( )
  {
    return //... a DbConnection
  }
  public DbParameter CreateParameter( string parameterName, SqlDbType type, int length = 0 )
  {
    return //... a DbParameter
  }
}