«« Unit Testing a Database Abstraction Mac OS X 10.4 Tiger Announced »»
blog header image
Database Perf and Object Creation

Premature optimization is bad, sure. But no optimization is bad too, if what you end up with is unusable. I've been thinking a lot about the performance implications of using a database with AudioMan, especially with object creation and destruction.

You see, with the old repository objects were kept in collections. "Indexes" -- which allowed me to get certain tracks based on artist or album -- were just lists of pointers to objects. The track objects already existed.

The database abstraction consists of methods that take parameters and returns lists of objects. For the implementation it would use a SQL query, execute it and then take the resulting recordset and make a list of track objects out of it. All of these objects would be new, and creating them would be a serious performance issue.

The worse case for this performance is when the "Whole Collection" is visible in the track list. Collections can be 500, 1000 or even 5000 songs or more. Creating this many new objects in a reasonable amount of time just doesn't seem possible. I don't even think I need to try it to prove that to myself. In fact, the performance for loading existing objects into the track list is already borderline on about 1000 songs.

So if I can't create these objects on the fly I better keep them around in RAM, already created. However, the great thing about the database is that I can do all sorts of SQL queries on the tracks' values and such, which I can't do easily with collections. So I want to keep the database and SQL queries around too. But then I had an idea ...

I'll SQL query the database, except instead of returning all of the track data I'll just return the tracks' unique key numbers. I'll also keep a map of all of the track objects in the collection. Then I can take the key numbers returned from the SQL query and get the objects I need from the map without having to recreate them.

The only problem with that solution of course is that the objects in the map have to be kept in sync with the data in the database. There are two ways to do this:

  1. When you make a change to a track, write the change to the object in the map as well as the database, or
  2. When you make a change to a track, write the change to the database and then reread the object in the map from the database.

To me, the second option is preferable because there's less chance of a bug causing synchronization problems -- the object will always have the same data as the database.

This is starting to look more and more like a persistence layer, isn't it? Are there any ideas I should borrow from known implementations/APIs of persistence layers? I don't want to reinvent the wheel here. What do you think?

Posted at June 28, 2004 at 01:02 AM EST
Last updated June 28, 2004 at 01:02 AM EST
Comments

Wierd... I was talking my last post about your last point along this same line.

Probably in september I will implement caching in the following manner (#1,2,3 are already in place, #4 is the missing element)...

1) Create db interface
2) implement sql (no caching what so ever)
3) implement inmemory
4) implement cache (see not below)

The cache will be a cross breed that uses sql and inmemory to do the persisting and this implementation is only worried about caching. The cache will probably have a bit of sql glue to balance the trade off between being in memory and being 100% accurate (aka the update issue).

» Posted by: aforward at July 2, 2004 03:34 PM

I recommend re-inventing the wheel. Think of Joel's 'what you do is what you should do yourself'

Persistence frameworks (at a coding level) are brittle, very brittle. I see them as being object-oriented databases that sit on top of relational databases.

Sorry, can no longer concentrate. The short of it... capuring the requirements of what to do when data changes is too difficult to configure (based on my knowledge of DataSets.Net which is similar to some 3rd party java persistence mechanisms)

» Posted by: aforward at July 2, 2004 03:41 PM

First, My experience says that the speed problem does not come from creating the objects. Object creation with a semi-space garbage collector is extremely fast. The problem is retrieving that many rows from the database.

You need to use a database cursor to retrieve pages of objects and create them as you need them. Take a look at http:www.glorp.org if you don't want to write all this yourself.

Second, you are making a grave error by assuming that you know where the speed problems lie; Until you profile the code, you simply have no idea. I don't either, since I can't profile your code, so I've just given you my best guess based on my experience. You really need to profile your code and then fix the actual problem, not what you imagine or suppose to be the problem.

» Posted by: anthony at July 6, 2004 09:37 AM

You're right Anth! :)

» Posted by: Ryan at July 6, 2004 11:39 AM
Google
 
Search scope: Web ryanlowe.ca