本文转自:https://weblog.west-wind.com/posts/2013/Dec/22/Entity-Framework-and-slow-bulk-INSERTs

I’ve been working on an internal project where I needed to load up a large amount of records into a database for testing. On this project I’m using Entity Framework Code First 6.02 and since I’m in initial testing mode I decided to create a DbInitializer that adds about 30,000 records to the database. I figured this was a reasonable idea for the moment as I’m testing out a few different scenarios for handling a large amount of data.

I ran into a major problem with the code I initially used however, which is simply using Context.Table.Add() to add these records to two tables in my DbContext:

private void SeedInitialData(WebMonitorContext context) { for (int i = 0; i < 10000; i++) { var user = new User() { Email = DataUtils.GenerateUniqueId() + "@.test.com", FirstName = DataUtils.GenerateUniqueId(), LastName = DataUtils.GenerateUniqueId(15), Company = DataUtils.GenerateUniqueId(12), StreetAddress = DataUtils.GenerateUniqueId(50), City = DataUtils.GenerateUniqueId(), State = "CA" }; user.Password = AppUtils.EncodePassword("password", user.Id); context.Users.Add(user); var site = new WebMonitorSite { UserId = user.Id, SiteName = "West Wind Technologies", Url = "http://west-wind.com/", SearchFor = "West Wind Technologies" }; context.Sites.Add(site); site = new WebMonitorSite { UserId = user.Id, SiteName = "GeoCrumbs", Url = "http://geocrumbs.net/", SearchFor = "GeoCrumbs" }; context.Sites.Add(site);

context.SaveChanges(); }

// context.SaveChanges(); // also tried this even slower }

So then, in order to get the database created, I ran a test that simply returned a few records from the users table and waited… and waited and waited…

At first I thought something was wrong, but I checked the DB and saw the database created, and records getting added, very, very slowly. At some point later I let this run out completely and found that it took 57 minutes to complete adding the 30,000 records. Yikes…

Now granted in a production application I wouldn’t recommend to do this kind of INSERT batch in a DbInitializer, but the problem that occurred here is something that can crop up in other places as well as you’ll see in a second.

Replacing with ADO.NET Code

Just to make sure it wasn’t issue with my local Sql Server instance I re-ran the same operations using my own DbDataAccess::SaveEntity() helper (part of the West Wind Utilities library in the West Wind Toolkit) which is a generic based Sql mapper that maps properties to database field in a flat table using Reflection and ADO.NET – IOW, it’s not super fast but easy to use on simple one-dimensional objects like this.

So instead of the Context.Table.Add() and context.SaveChanges() I simply replaced the Add() calls with the SaveEntity method:

context.Db.SaveEntity(user, "Users", "id");

and

context.Db.SaveEntity(site, "WebMonitorSites", "id");

Running this takes a mere 25 *seconds* for those same 30,000.

Context Bloat and Large Lists

My first thought was, this can’t be right and as it turns out there are a few ways to make EF work a lot faster in this scenario.

The issue here is that I’m creating a lot of records and I keep adding them to the same context instance. This means that by the time I’m done this DbContext will end up tracking 30,000 records. So the first few inserts probably run pretty fast, but as the list of records grows it gets slower and slower as EF tries to track and ever larger object graph. I’m not exactly sure why this should happen in this case as the data being added are very simple records into a two table schema with no relations, but clearly EF does more than just property change tracking as items are added to the context.

There are two easy fixes to this problem:

  • Keep the active Context Graph small by using a new context for each Unit of Work
  • Turn off AutoDetechChangesEnabled

Use a Unit of Work and a new Context for each set of Records

This approach basically never lets the Context get very large by recreating the context inside of the loop. So this would look like this:

for (int i = 0; i < 10000; i++) { var context = new WebMonitorContext();

… add items to the context tables

context.SaveChanges();

}

Making this little change (in a test, not the Iinitializer!) the performance now comes down to something reasonable: The same 30,000 inserts take 33 seconds.

This is a good practice in general. Building up large trees is a terrible idea inside of an application as you are making it more likely to cause lock conflicts or otherwise letting the data get stale. In most situations unit of work saves are more efficient.

This is one way to solve the Context graph size problem, but unfortunately this doesn’t work for my DbInitializer, because I don’t control the instantiation of the context when the initializer is called – EF passes the context in so I can’t create a new one.

Turn off AutoDetectChangesEnabled

Another solution suggested by @julielerman is to turn off AutoDetectChangesEnabled. This confirms that the problem in and of itself is the change tracking in EF on large datasets. Turning off this flag on the context also brings the time down to about 20 seconds and – more importantly - it also works for my DbInitializer since I can simply apply the flag on the existing passed in context:

context.Configuration.AutoDetectChangesEnabled = false;

//EFCore  _dbContext.ChangeTracker.AutoDetectChangesEnabled = false;

for (int i = 0; i < 10000; i++)
{

… do the Context.Add() / SaveChanges() here as before

}

This is the easiest solution in this case and it actually turns out to also be quite a bit faster.

AutoDetectChangesEnabled should be used very selectively if at all. There are other mechanisms such as AsNoTracking() that can be a better solution to avoid change tracking of lists. But in this case it solves a specific problem. Just make sure that when you’re done with your non-change tracking required operations that you turn the flag on again, or probably even better that you kill and recreate your context altogether (Unit of Work ideally but even in the current UOW scope if necessary).

Batching

Several people on twitter also suggested that instead of calling SaveChanges() on each unit of work, it’d be more efficient to batch a number of commands together which I also added on the bottom of the loop:

context.Configuration.AutoDetectChangesEnabled = false;

for (int i = 0; i < 10000; i++)
{
… do the Context.Add() / SaveChanges() here as before

if (i % 50 == 0)
context.SaveChanges();
}

context.SaveChanges() // save whatever is left over

And this knocks off another 4 seconds – down to 16 seconds which is good enough for me.

Another few people suggested that for large data imports, SQL Bulk Insert is probably a better choice, but you have to use tools other than Entity Framework to handle implementing this functionality. Personally I think that’s overkill unless the ultimate performance is required AND there’s actually a convenient way to get the data into the format that SQL Bulk Insert can import from such as CSV files. Creating those files in the first place can be slow for example and might negate any perf benefits you might get from the bulk insert in the first place.

DbTable.AddRange()

Another option for adding records is by using DbTable.AddRange() to build up the list of records in memory first and then submit the entire list at once in a single command. In this case the change detection doesn’t start until after all the records have been added.

To do this I can change the code to dump all the records to a List<T> first, then add the lists at the end:

[TestMethod]
public void InsertEfAddRangeTest()
{
var context = new WebMonitorContext(); var users = new List<User>();
var sites = new List<WebMonitorSite>(); for (int i = 0; i < 10000; i++)
{

users.Add(user); …
sites.Add(site);
} context.Users.AddRange(users);
context.Sites.AddRange(sites);
context.SaveChanges();
}

Note that here I’m not turning off AutoDetectChangesEnabled and the performance is in the 12 range which is provides almost identical performance to when AutoDetectChangesEnabled is off.

Realistically this only works if the set doesn’t get too large since this requires that the whole set of data is loaded into memory first. If you need to do multiple batches (combined with code similar to the Batching section) you also still would need to disable AutoDetectChangesEnabled=false.

Entity Framework

I’ve been slow to come around to accepting Entity Framework, since my early experiences with Version 1.0 really left a bad taste in my mouth. But since Version 4.0 and Code First, I’ve been using EF more and more. It’s come a long way and performance and stability at least close to raw ADO.NET perf in most cases. And for those few things that don’t work or are too slow I tend to fall back to ADO.NET as necessary using my data tools I’ve grafted onto the DbContext for easy access. This seems to be working well for me these days and I’ve had good results on the last 3 projects I’ve worked on as well as a few in-house projects.

But Entity Framework has its quirks as any ORM tool does, and there are lots of things that are found and learned the hard way by doing it wrong first, then finding a solution that doesn’t quite feels satisfying, because the original use case should have worked. This is one of them, but it’s minor. Hopefully this post will save somebody some grief when they search for this particular problem.

[转]Entity Framework and slow bulk INSERTs的更多相关文章

  1. entity framework extended library , bulk execute,deleting and updating ,opensource

    http://weblogs.asp.net/pwelter34/entity-framework-batch-update-and-future-queries

  2. [转]Porting to Oracle with Entity Framework NLog

    本文转自:http://izzydev.net/.net/oracle/entityframework/2017/02/01/Porting-to-Oracle-with-Entity-Framewo ...

  3. Entity Framework与ADO.NET批量插入数据性能测试

    Entity Framework是.NET平台下的一种简单易用的ORM框架,它既便于Domain Model和持久层的OO设计,也提高了代码的可维护性.但在使用中发现,有几类业务场景是EF不太擅长的, ...

  4. Entity Framework与ADO.Net及NHibernate的比较

    Entity Framework  是微软推荐出.NET平台ORM开发组件, EF相对于ado.net 的优点 (1)开发效率高,Entity Framework的优势就是拥有更好的LINQ提供程序. ...

  5. Using the Repository Pattern with ASP.NET MVC and Entity Framework

    原文:http://www.codeguru.com/csharp/.net/net_asp/mvc/using-the-repository-pattern-with-asp.net-mvc-and ...

  6. Professional C# 6 and .NET Core 1.0 - 38 Entity Framework Core

    本文内容为转载,重新排版以供学习研究.如有侵权,请联系作者删除. 转载请注明本文出处:Professional C# 6 and .NET Core 1.0 - 38 Entity Framework ...

  7. [转]How to Improve Entity Framework Add Performance?

    本文转自:http://entityframework.net/improve-ef-add-performance When you overuse the Add() method for mul ...

  8. Why you should use async tasks in .NET 4.5 and Entity Framework 6

    Improve response times and handle more users with parallel processing Building a web application usi ...

  9. Managing DbContext the right way with Entity Framework 6: an in-depth guide by mehdime

    UPDATE: the source code for DbContextScope is now available on GitHub: DbContextScope on GitHub. A b ...

随机推荐

  1. RQNOJ 1 明明的随机数

    查重和排序,这里我用的set进行存储数据,利用了set的唯一性和自动性,方便了很多 #include <iostream> using namespace std; #include &l ...

  2. springMVC学习 十二 拦截器

    一 拦截器概述 拦截器技术比较像java web技术中的过滤器技术,都是发送 请求时被拦截器拦截,在控制器的前后添加额外功能.但是和Spring中的Aop技术是由区别的.AOP 在特定方法前后扩充(一 ...

  3. selenium_unittest框架,TestCase引用

    新手,纯属个人理解,有问题可以给出建议奥~谢谢. 如以下代码,每一个test的类都是一个测试方法而测试方法必须由test_xxx开头命名,非test开头可能会执行不到,执行顺序如test1,test2 ...

  4. 14. The Realities of Telecommuting 远程办公的现状

    14. The Realities of Telecommuting 远程办公的现状 (1) Telecommuting——substituting the computer for the trip ...

  5. JVM--关于MinGC,FullGC

    一.Minor GC 发生在新生代上,因为新生代对象存活时间很短,因此 Minor GC 会频繁执行,执行的速度一般也会比较快,通过幸存区交换来处理 1.触发条件: 当创建对象时Eden区空间不够时触 ...

  6. yum-Remi源配置

    Remi repository 是包含最新版本 PHP 和 MySQL 包的 Linux 源,由 Remi 提供维护. 有个这个源之后,使用 YUM 安装或更新 PHP.MySQL.phpMyAdmi ...

  7. 菜刀(代码执行)函数和命令执行函数详解及Getshell方法

    i春秋作家:大家奥斯的哦 原文来自:https://bbs.ichunqiu.com/thread-41471-1-1.html 代码执行函数 VS 命令执行函数 一直想整理这两块的内容,但是一直没时 ...

  8. rocketmq搭建趟坑记

    这个坑对小白来讲可能要趟很久才能过,我就是这样~~明明很简单的配置,搞了半天 我用的是rocketmq4.1.0,配置了jvm参数,都能正常启动,且能在线上运行demo,但是线下就是连不上 在conf ...

  9. Ideas

    1.蔬菜店,自带种植的菜地.(实现蔬菜都是新采摘的.) 这个试用于农村,因为需要土地.农村现在蔬菜店大多也是外出进货.有些菜放久了,就坏掉了. 这里有问题就是,(1).如果销量不够,怎么让蔬菜不烂在菜 ...

  10. 往github提交代码流程

    一 首先在Github新建一个仓库,回到首页,点击右上角的New repository新建仓库. 二  在本地依次使用下面命令 …or create a new repository on the c ...