Now don’t get me wrong, I’m a HUGE fan of Entity Framework. What I don’t get though, is why Lazy Loading is on by default.
As one of my friends likes to quote “with great power comes great responsibility” and he’s absolutely right! Lazy loading is very powerful and should be used selectively. I’m not saying that you should never use it, but you should really think long and hard about how you use it.
By forgetting that Lazy Loading is on by default, it’s quite easy to create N+1 situations where what could have been a simple read from the database can turn into a nightmare!
N+1 situations are caused by reading a collection of entities from the database and iterating over them to read something from a Navigation Property. Since it’s not loaded when we try to access it, Entity Framework will try to load it from the database. Consequently, we are hitting the database once for the initial list and once for every entity in that list. This is why we call this N+1.
Now imagine that you have a Worker Role whose job is to query the database to generate graphs for reports. A single instance of this Role might work out fine, but the second you start scaling out… let’s just say that you might find the results interesting.
But since you’re here let’s dive a bit deeper into this.
For the sake of this example, for each chart that we want to generate we are going to query Entity Framework for 1000 entities. Then we iterate over them in order to read a property from a Navigation Property.
Essentially we are going to make 1001 requests instead of 1.
Scaling out means that we duplicate our processes over multiple Virtual Machines. So when we start to Scale the Worker Role, we start to multiply the number of requests hitting our Windows Azure SQL Database. This means that if we scale out to 5 instances and that each instance generates only one chart at a time (which should never be the case) our Windows Azure SQL Database will receive 5005 requests instead of 5.
This situation could easily have been avoided by deactivating lazy loading. Doing so would force the developer to choose between Eager Loading, Explicit Loading or activating Lazy Loading.
Eager Loading is the process whereby a query for one type of entity also loads related entities as part of the query. Eager loading is achieved by use of the Include method.
Explicit Loading is the process whereby loading related entities is done with an explicit call
Lazy Loading is the process whereby an entity or collection of entities is automatically loaded from the database the first time that a property referring to the entity/entities is accessed.
When you choose a pattern for loading related entities, consider the behavior of each approach with regard to the number and timing of connections made to the data source versus the amount of data returned by and the complexity of using a single query. Eager loading returns all related entities together with the queried entities in a single query. This means that, while there is only one connection made to the data source, a larger amount of data is returned in the initial query. Also, query paths result in a more complex query because of the additional joins that are required in the query that is executed against the data source.
Explicit and lazy loading enables you to postpone the request for related object data until that data is actually needed. This yields a less complex initial query that returns less total data. However, each successive loading of a related object makes a connection to the data source and executes a query. In the case of lazy loading, this connection occurs whenever a navigation property is accessed and the related entity is not already loaded. If you are concerned about which related entities are returned by the initial query or with managing the timing of when related entities are loaded from the data source, you should consider disabling lazy loading. Lazy loading is enabled in the constructor of the Entity Framework-generated object context. [More]
Find examples and more information about Loading Related Objects.
Disabling Lazy Loading
Disabling Lazy Loading in Entity Framework Code First can be achieved by configuring the DbContext
this.Configuration.LazyLoadingEnabled = false;
Disabling Lazy Loading using the Entity Framework Designer can be achieved by going into your model’s properties to set it to false.
Wrapping it up
When I start new projects using ORMs like Entity Framework, I disable Lazy Loading because it allows my team to ask the right questions at the right moment. If at some point we really need Lazy Loading and that other options aren’t acceptable, then I happily turn it on.
Isn’t the right answer to leave lazy loading on and know when you are going to need child rows and fetch them eagerly?
If you disable lazy loading all together how do you prevent then you always wind up fetching everything which is inefficient.
I think you could also make your property non-virtual which will prevent lazy loading?
The virtual property method works in Code First. I agree that it’s a good way to control it.
If you disable lazy loading, you don’t have to load everything. You can selectively load objects using explicit loading or by returning multiple record sets. EF is great at rebuilding relationships in memory.
Leaving lazy loading on by default usually ends up bringing applications to a crawl. Working effectively with EF requires some level of knowledge about how it works and not everyone is curious enough to push it that far.
turning lazy loading off by default is protecting yourself against yourself. turn it on when you really need it =)
N+1 has always been the problem with EF. And with most ORM for that matter. The problem isn’t lazy loading, but the way it’s done. Consider this: suppose you eagerly load N parents, each having M’ children for a total of M children (I’m coming to lazy loading, bear with me). Now suppose your ORM is doing that in those steps. First it builds the sql query that loads the parent (select a, b, c from parent [join bla bla bla] where x = y). Then it executes the query and does one pass in the reader, top to bottom, constructing a Parent for each row, and *puts the parent in a Dictionary*, where the key of the dictionary is the key of Parent as defined by your config. Now’s the magic: the ORM build exactly one sql query for ALL the children, using an easy trick: it propagates the WHERE clause of the Parent’s query into the Child’s query! so it becomes: “select d, e, f, foreignkey from child, parent [join bla bla bla] where parent.key = child.foreignkey AND x = y”. By propagating the where clause of the parent to the child, linking on their pk/fk complex, the ORM knows that it won’t get ANY orphant child. Now the ORM executes the query and does, again, exactly one pass into the reader, instantiates a child at each row but this time, instead (or in supplement to, if there are grand-child in the picture) of putting the child in a dictionary, it retrieves the parent in the parent’s dictionary it just built earlier, using the foreign key of the child it just loaded, and adds the child to the parent’s collection! BAM!!! you just did eager loading using exactly 2 sql statements, and done that in O(N+M).
Lazy loading can be done exactly like that by remembering the where clause; when you access a Parent’s child collection, you could say “oh and by the way, go and load the children of all my sibling Parents while you’re at it”
Oh and yeah, I’ve done it in an in-house ORM back in 2004, and it was able to load complex graphs of objects in exactly the same number of pass as there were objects ;-) It was also pretty good at deleting, inserting and updating objects using the same techniques, even updating the foreign key values, in case when you “move” a child around (not that it happened that often… but it was there). Thanks for the opportunity to remember that and talk about that very nice project :-)
NHibernate first has eager loading enabled which a lot of people didn’t know or understand resulting in complete databases getting fetched each time a single item was displayed on their website. They switched to lazy loading being the default because of this. Usually you want to optimize the query anyway when you want to access large sets of data in one transaction.