C# 在使用datareader读取数百万数据时,如何避免与数据库的连接丢失问题?
我有一个从数据库表中读取数据的类库。现在这个数据库表是客户端数据库,我的应用程序只有连接字符串和sql查询来打开连接、执行sql查询、读取数据和执行一些操作。 这个操作是什么,有点复杂(基本上是业务规则) 现在,用户以特定的格式提交sql查询,我的类库知道从sql查询结果中选择哪些列 我不知道我的类库会处理多少条记录。它可能是100200或数百万数据 目前,类库正在处理驻留在oracle上的9000万数据。我正在使用C# 在使用datareader读取数百万数据时,如何避免与数据库的连接丢失问题?,c#,ado.net,oracle-sqldeveloper,C#,Ado.net,Oracle Sqldeveloper,我有一个从数据库表中读取数据的类库。现在这个数据库表是客户端数据库,我的应用程序只有连接字符串和sql查询来打开连接、执行sql查询、读取数据和执行一些操作。 这个操作是什么,有点复杂(基本上是业务规则) 现在,用户以特定的格式提交sql查询,我的类库知道从sql查询结果中选择哪些列 我不知道我的类库会处理多少条记录。它可能是100200或数百万数据 目前,类库正在处理驻留在oracle上的9000万数据。我正在使用SQLDATAREADER读取这些数据 现在的问题是为了避免内存异常我正在使用s
SQLDATAREADER
读取这些数据
现在的问题是为了避免内存异常我正在使用sql data reader读取数据,但1乘1读取9000万数据,然后对这些记录执行一些操作,此时连接将保持打开状态,目前我面临连接丢失的问题:
ORA-03135: connection lost contact
1解决方案可能是读取数据块,但正如我所说,我不知道我可以处理的记录数量,而且SQL查询不在我手中,因为它是由我的类库拾取的用户提交的
我能做些什么来避免连接问题吗
更新:
public class LongRunningTask : IDisposable
{
public void Start(DbConnection connection, string sql)
{
using (var cmd = connection.CreateCommand())
{
cmd.CommandText = sql;
cmd.CommandTimeout = 0;
connection.Open();
using (var dr = cmd.ExecuteReader(CommandBehavior.CloseConnection))
{
//read 1 by 1 record and pass it to algorithm to do some complex processing
}
}
}
}
算法并不慢,这也不是问题。主要问题是,如果当前有来自ORACLE的9000万数据,读取部分的速度会很慢
我在SQL SERVER上测试了1亿个数据,但我没有遇到这个问题(虽然有时会出现传输层错误),尽管这个过程花费了很多时间。我只在ORACLE上遇到这个问题。您可以像这样设置连接超时限制:
command.CommandTimeout = 60; //The time in seconds to wait for the command to execute. The default is 30 seconds.
让数据读取器打开几个小时不是一个好主意。即使所有配置都正确,线路上的某个地方也可能存在瞬时错误(如您提到的传输层错误) 您可以在客户机代码中添加重试逻辑,使其更加健壮。一种方法是跟踪最后处理的记录,尝试重新连接,并在连接失败时从该位置“恢复”
private const int MAX_RETRY = 10;
private const int RETRY_INTERVAL_MS = 1000;
private string lastProcessedPosition = null;
public void Start(string connectionString, string sql)
{
var exceptions = new List<Exception>();
for (var i = 0; i < MAX_RETRY; i++)
{
try
{
if (Process(connString, sql, lastProcessedPosition)) return;
}
catch(Exception ex)
{
exceptions.Add(ex);
}
System.Threading.Thread.Sleep(RETRY_INTERVAL_MS);
}
throw new AggregateException(exceptions);
}
dr.ToPositionString()
是一种扩展方法,您可以创建该方法,以根据表架构使行唯一。简短回答:
我以前遇到过这种情况,这是因为我公司网络上的防火墙规则
长篇大论的回答和不请自来的建议:
我认为您的主要问题是应用程序设计。如果您要处理数百万条记录,可能需要很长时间…很长时间,这取决于您必须执行的操作。我开发了一个应用程序来加密数据库中1亿个静止的卡号,花了3周时间才完成。处理真正的大数据很棘手;我遇到了各种各样的问题。 以下是我的一些建议 1) 您将听到您的问题在于超时设置。可能不是这样。在我工作的地方,我们有防火墙规则,会在一段时间后(我不记得是15或30分钟)终止数据库连接,我们花了数周时间才弄清楚为什么我们的连接会突然中断 2) 一次收回数百万张唱片不是一个好主意 3) 您应该在代码中加入一些SQL注入预防措施
4) 我建议使用一个类似ORM的实体框架,这使得循环和分块更容易 您无法获取所有数据并将其保存到某个内存对象中,然后释放与数据库的连接;发布您复杂的子项规则,一旦完成,您需要将这些数据更新回数据库,再次打开连接并进行批量更新
希望我有点道理。这个解决方案是我过去从数据库中读取大型数据集时使用的,但它是分块处理的: 首先,我选择实现一种获取数据库连接的方法。请注意,我将ConnectionTimeout设置为0,因为我知道此进程将长期运行
private static OracleConnection GetConnection()
{
return new OracleConnection(new OracleConnectionStringBuilder
{
//TODO: Set other connection string properties
ConnectionTimeout = 0
}.ConnectionString);
}
public static IEnumerable<T> GetData<T>(string sql)
{
using (var conn = GetConnection())
{
if (ConnectionState.Closed == conn.State) conn.Open();
using (var cmd = conn.CreateCommand())
{
cmd.CommandTimeout = 0;
cmd.CommandType = CommandType.Text;
cmd.CommandText = sql; //TODO: Make sure you do standard sql injection prevention
using (var reader = cmd.ExecuteReader())
{
//We want to optimize the number of round trips to the DB our reader makes.
//Setting the FetchSize this way will make the reader bring back 5000 records
//with every trip to the DB
reader.FetchSize = reader.RowSize * 5000;
while (reader.Read())
{
var values = new object[reader.FieldCount];
reader.GetValues(values);
//This assumes that type T has a constructor that takes in an object[]
//and the mappings of object[] to properties is done in that constructor
yield return (T)Activator.CreateInstance(typeof(T), new object[] { values });
}
}
}
}
}
接下来,我想使用一些通用的“GetData”方法从数据库中读取数据。注意,它的返回类型是显式的“IEnumerable”。您可以强键入它,而不是让它成为泛型,但它需要保持返回IEnumerable,以便利用“收益率-回报率”
还要注意,我将CommandTimeout设置为0,因为我知道这个进程将长期运行
private static OracleConnection GetConnection()
{
return new OracleConnection(new OracleConnectionStringBuilder
{
//TODO: Set other connection string properties
ConnectionTimeout = 0
}.ConnectionString);
}
public static IEnumerable<T> GetData<T>(string sql)
{
using (var conn = GetConnection())
{
if (ConnectionState.Closed == conn.State) conn.Open();
using (var cmd = conn.CreateCommand())
{
cmd.CommandTimeout = 0;
cmd.CommandType = CommandType.Text;
cmd.CommandText = sql; //TODO: Make sure you do standard sql injection prevention
using (var reader = cmd.ExecuteReader())
{
//We want to optimize the number of round trips to the DB our reader makes.
//Setting the FetchSize this way will make the reader bring back 5000 records
//with every trip to the DB
reader.FetchSize = reader.RowSize * 5000;
while (reader.Read())
{
var values = new object[reader.FieldCount];
reader.GetValues(values);
//This assumes that type T has a constructor that takes in an object[]
//and the mappings of object[] to properties is done in that constructor
yield return (T)Activator.CreateInstance(typeof(T), new object[] { values });
}
}
}
}
}
公共静态IEnumerable GetData(字符串sql)
{
使用(var conn=GetConnection())
{
if(ConnectionState.Closed==conn.State)conn.Open();
使用(var cmd=conn.CreateCommand())
{
cmd.CommandTimeout=0;
cmd.CommandType=CommandType.Text;
cmd.CommandText=sql;//TODO:确保执行标准的sql注入预防
使用(var reader=cmd.ExecuteReader())
{
//我们希望优化到我们的阅读器的DB的往返次数。
//以这种方式设置FetchSize将使读取器返回5000条记录
//每次去DB
reader.FetchSize=reader.RowSize*5000;
while(reader.Read())
{
var值=新对象[reader.FieldCount];
reader.GetValues(值);
//这假设类型T有一个接受对象[]的构造函数
//对象[]到属性的映射在该构造函数中完成
yield return(T)Activator.CreateInstance(typeof(T),新对象[]{values});
}
}
}
}
}
接下来我想有一些方法来实现这个奇特的总线
public static void Main(string[] args)
{
foreach (var batch in GetData<string>("hello world").Batch(50000))
{
ProcessBusinessLogic(batch);
}
}
using System;
using System.Collections.Generic;
using System.Data;
using MoreLinq;
using Oracle.ManagedDataAccess.Client;
namespace ReadLargeDataset
{
public class Program
{
public static void Main(string[] args)
{
foreach (var batch in GetData<string>("hello world").Batch(50000))
{
ProcessBusinessLogic(batch);
}
}
public static void ProcessBusinessLogic<T>(IEnumerable<T> data)
{
//TODO Implement fancy business logic here
}
public static IEnumerable<T> GetData<T>(string sql)
{
using (var conn = GetConnection())
{
if (ConnectionState.Closed == conn.State) conn.Open();
using (var cmd = conn.CreateCommand())
{
cmd.CommandTimeout = 0;
cmd.CommandType = CommandType.Text;
cmd.CommandText = sql; //TODO: Make sure you do standard sql injection prevention
using (var reader = cmd.ExecuteReader())
{
//We want to optimize the number of round trips to the DB our reader makes.
//Setting the FetchSize this way will make the reader bring back 5000 records
//with every trip to the DB
reader.FetchSize = reader.RowSize * 5000;
while (reader.Read())
{
var values = new object[reader.FieldCount];
reader.GetValues(values);
//This assumes that type T has a constructor that takes in an object[]
//and the mappings of object[] to properties is done in that constructor
yield return (T)Activator.CreateInstance(typeof(T), new object[] { values });
}
}
}
}
}
private static OracleConnection GetConnection()
{
return new OracleConnection(new OracleConnectionStringBuilder
{
//TODO: Set other connection string properties
ConnectionTimeout = 0
}.ConnectionString);
}
}
}
#region License and Terms
// MoreLINQ - Extensions to LINQ to Objects
// Copyright (c) 2009 Atif Aziz. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
#endregion
// ReSharper disable CheckNamespace
namespace MoreLinq
{
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
public static class MoreEnumerable
{
/// <summary>
/// Batches the source sequence into sized buckets.
/// </summary>
/// <typeparam name="TSource">Type of elements in <paramref name="source"/> sequence.</typeparam>
/// <param name="source">The source sequence.</param>
/// <param name="size">Size of buckets.</param>
/// <returns>A sequence of equally sized buckets containing elements of the source collection.</returns>
/// <remarks> This operator uses deferred execution and streams its results (buckets and bucket content).</remarks>
public static IEnumerable<IEnumerable<TSource>> Batch<TSource>(this IEnumerable<TSource> source, int size)
{
return Batch(source, size, x => x);
}
/// <summary>
/// Batches the source sequence into sized buckets and applies a projection to each bucket.
/// </summary>
/// <typeparam name="TSource">Type of elements in <paramref name="source"/> sequence.</typeparam>
/// <typeparam name="TResult">Type of result returned by <paramref name="resultSelector"/>.</typeparam>
/// <param name="source">The source sequence.</param>
/// <param name="size">Size of buckets.</param>
/// <param name="resultSelector">The projection to apply to each bucket.</param>
/// <returns>A sequence of projections on equally sized buckets containing elements of the source collection.</returns>
/// <remarks> This operator uses deferred execution and streams its results (buckets and bucket content).</remarks>
public static IEnumerable<TResult> Batch<TSource, TResult>(this IEnumerable<TSource> source, int size,
Func<IEnumerable<TSource>, TResult> resultSelector)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (size <= 0) throw new ArgumentOutOfRangeException(nameof(size));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
return BatchImpl(source, size, resultSelector);
}
private static IEnumerable<TResult> BatchImpl<TSource, TResult>(this IEnumerable<TSource> source, int size,
Func<IEnumerable<TSource>, TResult> resultSelector)
{
Debug.Assert(source != null);
Debug.Assert(size > 0);
Debug.Assert(resultSelector != null);
TSource[] bucket = null;
var count = 0;
foreach (var item in source)
{
if (bucket == null)
{
bucket = new TSource[size];
}
bucket[count++] = item;
// The bucket is fully buffered before it's yielded
if (count != size)
{
continue;
}
// Select is necessary so bucket contents are streamed too
yield return resultSelector(bucket.Select(x => x));
bucket = null;
count = 0;
}
// Return the last bucket with all remaining elements
if (bucket != null && count > 0)
{
yield return resultSelector(bucket.Take(count));
}
}
}
}
select
columns...
from
data sources...
where
some conditions...
offset @offset
fetch first @pageSize rows
public class LongRunningTask
{
const long pageSize = 100000L; //--> ...or whatever the market will bear
const int retryLimit = 3;
public void Start( ConnectionFactory factory, string sql )
{
var done = false;
var page = 0L;
var index = 0L;
var retries = 0;
var retrying = false;
while ( !done )
{
try
{
using ( var connection = factory.CreateConnection( ) )
{
using ( var cmd = connection.CreateCommand( ) )
{
cmd.CommandType = CommandType.Text;
cmd.CommandText = sql;
cmd.Parameters.Add( factory.CreateParameter( "@pageSize", SqlDbType.BigInt ) );
cmd.Parameters.Add( factory.CreateParameter( "@offset", SqlDbType.BigInt ) );
cmd.Parameters[ "@pageSize" ].Value = pageSize - ( retrying ? index : 0 );
cmd.Parameters[ "@offset" ].Value = page + ( retrying ? index : 0 );
connection.Open( );
using ( var dr = cmd.ExecuteReader( ) )
{
index = retrying ? index : 0;
retrying = false;
done = !dr.HasRows; //--> didn't get anything, we're done!
while ( dr.Read( ) )
{
//read 1 by 1 record and pass it to algorithm to do some complex processing
index++;
}
}
}
}
page++;
}
catch ( Exception ex )
{
Console.WriteLine( ex );
if ( retryLimit < retries++ ) throw;
retrying = true;
}
}
}
}
public class ConnectionFactory
{
public DbConnection CreateConnection( )
{
return //... a DbConnection
}
public DbParameter CreateParameter( string parameterName, SqlDbType type, int length = 0 )
{
return //... a DbParameter
}
}