C# 使用新实体框架DbContext处理IQueryable
我一直在努力改进一个使用EF5构建的库。我想使用并行库来提高性能。我知道DbContext不是线程安全的,对于每个事务都需要创建一个新的DbContext 我认为我面临的问题是如何在每次迭代IQueryable时生成新的上下文。以下是我的实现的截断方法:C# 使用新实体框架DbContext处理IQueryable,c#,entity-framework,c#-4.0,parallel-processing,C#,Entity Framework,C# 4.0,Parallel Processing,我一直在努力改进一个使用EF5构建的库。我想使用并行库来提高性能。我知道DbContext不是线程安全的,对于每个事务都需要创建一个新的DbContext 我认为我面临的问题是如何在每次迭代IQueryable时生成新的上下文。以下是我的实现的截断方法: public virtual void ProcessAttachments(IQueryable<File> results) { var uniqueOrderKeys =
public virtual void ProcessAttachments(IQueryable<File> results)
{
var uniqueOrderKeys = results.Select(r => r.ForeignKey).Distinct();
//process each order
Parallel.ForEach(uniqueOrderKeys, key =>
{
var key1 = key;
var resultsForKey = results.Where(result => result.ForeignKey == key1);
//process File objects for the order
Parallel.ForEach(resultsForKey, result =>
{
string orderNum;
using (var da = new DataAccess()) //DataAccess creates the DbContext and is implementing IDisposable
{
orderNum = da.GetOrderNumberByOrderKey(key);
}
});
});
}
公共虚拟void进程附件(IQueryable结果)
{
var uniqueOrderKeys=results.Select(r=>r.ForeignKey).Distinct();
//处理每个订单
Parallel.ForEach(uniqueOrderKeys,key=>
{
var-key1=键;
var resultsForKey=results.Where(result=>result.ForeignKey==key1);
//处理订单的文件对象
Parallel.ForEach(resultsForKey,result=>
{
字符串orderNum;
使用(var da=new DataAccess())//DataAccess创建DbContext并实现IDisposable
{
orderNum=da.GetOrderNumberByOrderKey(key);
}
});
});
}
有没有一种方法可以指定一个新的DbContext,用于循环和检索我的IQueryable结果?我只是把它放在一起,我想它可能会帮助您:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using WhereverYourObjectContextLives;
/// <summary>
/// Provides an iterator pattern over a collection such that the results may be processed in parallel.
/// </summary>
public abstract class ParallelSkipTakeIterator <T>
{
private int currentIndex = 0;
private int batchSize;
private Expression<Func<T, int>> orderBy;
private ParallelQuery<T> currentBatch;
/// <summary>
/// Build the iterator, specifying an Order By function, and optionally a <code>batchSize</code>.
/// </summary>
/// <param name="orderBy">Function which selects the id to sort by</param>
/// <param name="batchSize">number of rows to return at once - defaults to 1000</param>
/// <remarks>
/// <code>batchSize</code> balances overhead with cost of parallelizing and instantiating
/// new database contexts. This should be scaled based on observed performance.
/// </remarks>
public ParallelSkipTakeIterator(Expression<Func<T, int>> orderBy, int batchSize = 1000)
{
this.batchSize = batchSize;
this.orderBy = orderBy;
}
/// <summary>
/// Accesses the materialized result of the most recent iteration (execution of the query).
/// </summary>
public ParallelQuery<T> CurrentBatch
{
get
{
if (this.currentBatch == null)
{
throw new InvalidOperationException("Must call HasNext at least once before accessing the CurrentBatch.");
}
return this.currentBatch;
}
}
/// <summary>
/// Does the current iterator have another batch of data to process?
/// </summary>
/// <returns>true if more data can be accessed via <code>CurrentBatch</code></returns>
/// <remarks>
/// Creates a new database context, issues a query, and places a materialized collection in <code>CurrentBatch</code>.
/// Context is disposed once the query is issued.
/// Materialized collection is specified by <code>BuildIQueryable</code>. Use of any associated navigation properties
/// must be accounted for by using the appropriate <code>.Include</code> operator where the query is
/// built in <code>BuildIQueryable</code>.
/// </remarks>
public bool HasNext()
{
using (YourObjectContextHere db = new YourObjectContextHere())
{
this.currentBatch = this.BuildIQueryable(db)
.OrderBy(this.orderBy)
.Skip(this.currentIndex)
.Take(this.batchSize)
.ToList()
.AsParallel();
this.currentIndex += this.batchSize;
return currentBatch.Count() > 0;
}
}
/// <summary>
/// Given a Database Context, builds a query which can be executed in batches.
/// </summary>
/// <param name="db">context on which to build and execute the query</param>
/// <returns>a query which will be executed and materialized</returns>
/// <remarks>Context will be disposed as soon a HasNext has been executed.</remarks>
protected abstract IQueryable<T> BuildIQueryable(YourObjectContextHere db);
}
然后,您可以将其子类化并实现BuildIQueryable
,如下所示:
class MyObjectIterator: ParallelSkipTakeIterator<MyObject>
{
private List<int> instanceIds;
public PropertyRecordMatchFileIterator(List<int> someExtraInfoNeededByQuery)
: base(f => f.InstanceId)
{
this.instanceIds = someExtraInfoNeededByQuery;
}
protected override IQueryable<MyObject> BuildIQueryable(YourObjectContextHere db)
{
IQueryable<MyObject> myObjects= db.SomeCollection.Select(x => this.instanceIds.Contains(x).Include("SomethingImportant");
return myObjects;
}
}
class MyObjectIterator:ParallelSkipTakeIterator
{
私有列表实例ID;
公共属性RecordMatchFileIterator(列出查询所需的一些外部信息)
:base(f=>f.InstanceId)
{
this.instanceId=someExtraInfoneedByQuery;
}
受保护的覆盖IQueryable BuildIQueryable(YourObjectContextHere db)
{
IQueryable myObjects=db.SomeCollection.Select(x=>this.instanceIds.Contains(x).Include(“SomethingImportant”);
返回对象;
}
}
最后,您可以循环数据集,如下所示:
MyObjectIterator myIterator = new MyObjectIterator(someExtraInfoNeededByQuery);
while (myIterator.HasNext())
{
ParallelQuery<MyObject> query = myIterator.CurrentBatch;
query.ForAll(item =>
doSomethingCool(item));
}
MyObjectIterator myIterator=新的MyObjectIterator(查询所需的某些外部信息);
while(myIterator.HasNext())
{
ParallelQuery=myIterator.CurrentBatch;
query.ForAll(项=>
doSomethingCool(项目));
}
我刚刚把这些放在一起,我想它可能会帮助您:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using WhereverYourObjectContextLives;
/// <summary>
/// Provides an iterator pattern over a collection such that the results may be processed in parallel.
/// </summary>
public abstract class ParallelSkipTakeIterator <T>
{
private int currentIndex = 0;
private int batchSize;
private Expression<Func<T, int>> orderBy;
private ParallelQuery<T> currentBatch;
/// <summary>
/// Build the iterator, specifying an Order By function, and optionally a <code>batchSize</code>.
/// </summary>
/// <param name="orderBy">Function which selects the id to sort by</param>
/// <param name="batchSize">number of rows to return at once - defaults to 1000</param>
/// <remarks>
/// <code>batchSize</code> balances overhead with cost of parallelizing and instantiating
/// new database contexts. This should be scaled based on observed performance.
/// </remarks>
public ParallelSkipTakeIterator(Expression<Func<T, int>> orderBy, int batchSize = 1000)
{
this.batchSize = batchSize;
this.orderBy = orderBy;
}
/// <summary>
/// Accesses the materialized result of the most recent iteration (execution of the query).
/// </summary>
public ParallelQuery<T> CurrentBatch
{
get
{
if (this.currentBatch == null)
{
throw new InvalidOperationException("Must call HasNext at least once before accessing the CurrentBatch.");
}
return this.currentBatch;
}
}
/// <summary>
/// Does the current iterator have another batch of data to process?
/// </summary>
/// <returns>true if more data can be accessed via <code>CurrentBatch</code></returns>
/// <remarks>
/// Creates a new database context, issues a query, and places a materialized collection in <code>CurrentBatch</code>.
/// Context is disposed once the query is issued.
/// Materialized collection is specified by <code>BuildIQueryable</code>. Use of any associated navigation properties
/// must be accounted for by using the appropriate <code>.Include</code> operator where the query is
/// built in <code>BuildIQueryable</code>.
/// </remarks>
public bool HasNext()
{
using (YourObjectContextHere db = new YourObjectContextHere())
{
this.currentBatch = this.BuildIQueryable(db)
.OrderBy(this.orderBy)
.Skip(this.currentIndex)
.Take(this.batchSize)
.ToList()
.AsParallel();
this.currentIndex += this.batchSize;
return currentBatch.Count() > 0;
}
}
/// <summary>
/// Given a Database Context, builds a query which can be executed in batches.
/// </summary>
/// <param name="db">context on which to build and execute the query</param>
/// <returns>a query which will be executed and materialized</returns>
/// <remarks>Context will be disposed as soon a HasNext has been executed.</remarks>
protected abstract IQueryable<T> BuildIQueryable(YourObjectContextHere db);
}
然后,您可以将其子类化并实现BuildIQueryable
,如下所示:
class MyObjectIterator: ParallelSkipTakeIterator<MyObject>
{
private List<int> instanceIds;
public PropertyRecordMatchFileIterator(List<int> someExtraInfoNeededByQuery)
: base(f => f.InstanceId)
{
this.instanceIds = someExtraInfoNeededByQuery;
}
protected override IQueryable<MyObject> BuildIQueryable(YourObjectContextHere db)
{
IQueryable<MyObject> myObjects= db.SomeCollection.Select(x => this.instanceIds.Contains(x).Include("SomethingImportant");
return myObjects;
}
}
class MyObjectIterator:ParallelSkipTakeIterator
{
私有列表实例ID;
公共属性RecordMatchFileIterator(列出查询所需的一些外部信息)
:base(f=>f.InstanceId)
{
this.instanceId=someExtraInfoneedByQuery;
}
受保护的覆盖IQueryable BuildIQueryable(YourObjectContextHere db)
{
IQueryable myObjects=db.SomeCollection.Select(x=>this.instanceIds.Contains(x).Include(“SomethingImportant”);
返回对象;
}
}
最后,您可以循环数据集,如下所示:
MyObjectIterator myIterator = new MyObjectIterator(someExtraInfoNeededByQuery);
while (myIterator.HasNext())
{
ParallelQuery<MyObject> query = myIterator.CurrentBatch;
query.ForAll(item =>
doSomethingCool(item));
}
MyObjectIterator myIterator=新的MyObjectIterator(查询所需的某些外部信息);
while(myIterator.HasNext())
{
ParallelQuery=myIterator.CurrentBatch;
query.ForAll(项=>
doSomethingCool(项目));
}
我想知道,与只使用一个查询来获取所需数据相比,并行查询能给您带来多大好处。您所拥有的似乎是向数据库发送了大量查询,在我看来,应该可以在一个查询中获取所有信息。像我在示例中所做的那样,并行处理需要大约对于给定的数据集,需要2分钟。之前处理同一个数据集大约需要6分钟。差异非常明显。我是否额外创建了数百个到数据库的连接?可能。这真的很重要吗?我不这么认为。你知道什么花费了最多的时间吗?6分钟似乎很长…你似乎不关心跟踪-您是否尝试过使用.AsNoTracking()
获取实体?我们在这里讨论了多少数据?我必须查看.AsNoTracking()的功能。这6分钟的大部分时间是检索数据所花费的时间。我做了一个.ToList()的测试在我的IQueryable结果中,强制一次提取所有数据大约需要6分钟。我们讨论的数据大约是1000个.jpg blob。其余的代码将文件对象转换回.jpg,并根据需要对其重命名。我想知道与只进行一次查询相比,使查询并行有多大好处at为您获取所需的数据。您所拥有的似乎向数据库发送了大量查询,在我看来,应该可以在一个查询中获取所有信息。像我在示例中所做的那样,并行处理一个给定数据集大约需要2分钟。之前处理同一数据集大约需要6分钟。区别在于非常明显。我是否额外创建了数百个到数据库的连接?可能。这真的很重要吗?我不这么认为。你知道什么花费了最多的时间吗?6分钟似乎很长…你似乎不关心跟踪-你是否尝试过使用.AsNoTracking()获取实体
?我们在这里讨论了多少数据?我必须查找.AsNoTracking()的作用。这6分钟的大部分时间是检索数据所花费的时间。我在IQueryable结果上做了一个.ToList(),强制一次提取所有数据,这花费了大约6分钟。我们没有收集数据