There are two more features I want to get into Simple.Data before releasing 1.0beta1: upserts, and eager loading. Upserts are pretty straight-forward, but there are a few different approaches to eager loading and I want to get some input from the community before I pick one.

In case you don’t know, eager loading is a pattern for getting data from the database in an optimal fashion, with the minimum number of requests or least-possible server load. It’s the opposite of lazy loading, which Simple.Data already does.

The implementation syntax will look like this:

    IEnumerable<Customer> customers = _db.Customers.FindAllByRegion("South")  
                .With(_db.Customers.Orders)
.Cast<Customer>();

This will create the customer objects and populate an ICollection<Order> Orders property in each one, assuming there is one.

There are two directions in which a class may have relationships with other classes: it may have “parent” (or “lookup”) classes, where there is at most a single instance of that parent class within the instance of the class in question; and it may have “child” classes, where there is a collection of instances of the child class. Eager loading of parent classes is trivial; you just add the necessary join(s) to the master table(s) and marshal the data into the right places in code.

Eager loading of child classes is similarly easy when there is only a single child collection to be loaded; you join to the detail table and then process the results so that a single instance of the main table row is created, and instances of the detail table rows are grouped into collections.

For example, loading user objects with a list of phone numbers:

SELECT User.Id, User.Name, Phone.Number
FROM User
LEFT JOIN Phone ON User.Id = Phone.UserId

Returned rows:

User.Id User.Name Phone.Number
1 Alice 0123456789
1 Alice 0738473284
2 Bob 0129934845
2 Bob 0729478594

From this we create a User object for Alice with her two phone numbers, and an object for Bob with his. Even if there are a lot of rows in the detail table, this is still the preferred way of handling eager loading for this scenario as it involves the fewest round-trips to the database. Often, you can handle a parent-child-child relationship using this algorithm, too, although beyond that point you’re getting a bit scary.

The complexity arises when you want to load two unrelated child collections, say, load Customers with their Invoices and Orders:

SELECT Customer.*, Invoice.*, Order.*
FROM Customer
LEFT JOIN Invoice ON Customer.Id = Invoice.CustomerId
LEFT JOIN Order ON Customer.Id = Order.CustomerId

This creates what is technically called an “outer Cartesian product”, where a row is returned for every possible combination of the rows from the two outer-joined tables. So if you’re grabbing the details for a single valuable customer who has placed a thousand orders and generated a thousand invoices, you get a million rows back from the database. I have a hunch this might not be optimal.

So this is my Request for Comments: what would you like to see done here?

The options are:

  1. Only allow one detail-table With clause to be specified per query. Throw an exception if there is more than one.
  2. Allow multiple detail-table With clauses, and use additional SELECT operations for all except the first.
  3. Add a parameter to the With method allowing the developer to specify the technique on a per-detail-table basis:
  4.     var customers = _db.Customers.FindAllByRegion("South")  
                    .With(_db.Customers.Orders, WithUsing.Join)
                    .With(_db.Customers.Orders.Items, WithUsing.Join)
                    .With(_db.Customers.Invoices, WithUsing.ExtraQuery);
    
    If only one detail table is specified, or one detail with a detail below it (e.g. _db.Customers.Orders.Items), the default behaviour when no WithUsing parameter is supplied will be to use JOIN. Multiple detail tables without a WithUsing will cause the ADO adapter to throw an exception (other adapters, such as the one for MongoDB, can use the With clause as they see fit).

I think I’ve rubber ducked here a bit, as Option 3 is looking like a no-brainer, but I’ve typed it all now, so if you do have any comments or suggestions, please do leave them below. I’m planning on starting code for this on Sunday morning, so there’s plenty of time to change my mind.