C# 执行EF SaveChanges时处理重复记录
我有一个web应用程序,我在其中解析一个csv文件,该文件中可以有超过200000条记录。我解析每一行的信息,验证数据库中不存在密钥,然后将其添加到上下文中。当计数达到10000条记录时,它调用SaveChanges例程。问题是上下文中可能存在重复项,并且会出错。这是在与Azure SQL server通信的Azure VM上运行的 两个问题,我如何处理重复的问题,有什么方法可以提高速度,因为它需要几个小时才能运行C# 执行EF SaveChanges时处理重复记录,c#,sql-server,entity-framework-6,C#,Sql Server,Entity Framework 6,我有一个web应用程序,我在其中解析一个csv文件,该文件中可以有超过200000条记录。我解析每一行的信息,验证数据库中不存在密钥,然后将其添加到上下文中。当计数达到10000条记录时,它调用SaveChanges例程。问题是上下文中可能存在重复项,并且会出错。这是在与Azure SQL server通信的Azure VM上运行的 两个问题,我如何处理重复的问题,有什么方法可以提高速度,因为它需要几个小时才能运行 using (LoanFileEntities db =
using (LoanFileEntities db = new LoanFileEntities())
{
db.Configuration.AutoDetectChangesEnabled = false; // 1. this is a huge time saver
db.Configuration.ValidateOnSaveEnabled = false; // 2. this can also save time
while (parser.Read())
{
counter++;
int loan_code = 0;
string loan_code_string = parser["LoanId"];
string dateToParse = parser["PullDate"].Trim();
DateTime date_pulled;
try
{
date_pulled = DateTime.Parse(dateToParse, CultureInfo.InvariantCulture);
}
catch (Exception)
{
throw new Exception("No Pull Date for line " + counter);
}
string originationdate = parser["OriginationDate"].Trim();
DateTime date_originated;
try
{
date_originated = DateTime.Parse(originationdate, CultureInfo.InvariantCulture);
}
catch (Exception)
{
throw new Exception("No Origination Date for line " + counter);
}
dateToParse = parser["DueDate"].Trim();
DateTime date_due;
try
{
date_due = DateTime.Parse(dateToParse, CultureInfo.InvariantCulture);
}
catch (Exception)
{
throw new Exception("No Due Date for line " + counter);
}
string region = parser["Region"].Trim();
string source = parser["Channel"].Trim();
string password = parser["FilePass"].Trim();
decimal principalAmt = Convert.ToDecimal(parser["Principal"].Trim());
decimal totalDue = Convert.ToDecimal(parser["TotalDue"].Trim());
string vitaLoanId = parser["VitaLoanId"];
var toAdd =
db.dfc_LoanRecords.Any(
x => x.loan_code_string == loan_code_string);
if (!toAdd)
{
dfc_LoanRecords loan = new dfc_LoanRecords();
loan.loan_code = loan_code;
loan.loan_code_string = loan_code_string;
loan.loan_principal_amt = principalAmt;
loan.loan_due_date = date_due;
loan.date_pulled = date_pulled;
loan.date_originated = date_originated;
loan.region = region;
loan.source = source;
loan.password = password;
loan.loan_amt_due = totalDue;
loan.vitaLoanId = vitaLoanId;
loan.load_file = fileName;
loan.load_date = DateTime.Now;
switch (loan.region)
{
case "UK":
if (location.Equals("UK"))
{
//db.dfc_LoanRecords.Add(loan);
if (loan.source == "Online")
{
counter_new_uk_online++;
}
else
{
counter_new_uk_retail++;
}
}
break;
case "US":
if (location.Equals("US"))
{
db.dfc_LoanRecords.Add(loan);
if (loan.source == "Online")
{
counter_new_us_online++;
}
else
{
counter_new_us_retail++;
}
}
break;
case "Canada":
if (location.Equals("US"))
{
db.dfc_LoanRecords.Add(loan);
if (loan.source == "Online")
{
counter_new_cn_online++;
}
else
{
counter_new_cn_retail++;
}
}
break;
}
// delay save to speed up load. 3. also saves transactional time
if (counter % 10000 == 0)
{
db.SaveChanges();
}
}
} // end of parser read
db.SaveChanges();
}
}
}
我建议在将代码发送到.SaveChanges()之前删除代码中的重复项 我没有详细讨论重复删除,而是列出了有关StackOverflow的现有问题和答案的链接列表,这些链接可能会有所帮助: