« March 2005 May 2005 »
blog header image
# Looking Forward to Metadata

I've been concerned with file metadata for some time -- not just with the Durham and AudioMan projects but also when it comes to my own personal file management.

As far as my music collection goes, its at the size right now where I believe that accurate and complete metadata is the only practical way to manage it. Since I have no tool to do that with right now -- outside of manual tag editing in iTunes -- and no massive storage device to hold my entire collection on, I am currently not managing my collection properly.

This worries me a bit. I invest a lot of time and money in my music collection. But I digress...

Metadata is starting to get more attention lately. In the present day we have Apple's release of Mac OS 10.4 Tiger and its Spotlight search technology. After reading Ars Technica's review of Tiger I'm convinced that Mac OS X and Spotlight are still far from solving my metadata problems but they are one step closer.

Versions of Microsoft Windows that may natively support file metadata on the other hand are so far off they are still vapourware. This doesn't help Windows users much in the present. Metadata support in Windows probably won't be mature or exploited effectively by applications until years after it is released.

Third party tools like Google Desktop Search are interesting in that they put pressure on the operating system vendors to get their asses moving on metadata and search. These third party tools are showing users that metadata is useful and practical. At the same time, users don't even realise they are dealing with metadata, they just think that searching is better.

But these desktop tools are in the unfortunate position of having the rug suddenly pulled out from under them. The operating system vendors will put most of this functionality right in the operating system (Apple, to a large extent, already has). In terms of building applications on top of these desktop tools, that makes them poor candidates. In a few years these third party projects will be redundant. So developers will wait for the operating system vendors like Microsoft and Apple to make decent APIs to access file metadata.

The open source project I'm working on, the Durham Metadata Framework for Eclipse aims to give an operating system agnostic abstraction to file metadata. Right now plugins are written for certain file/content types but later metadata could be read straight from the operating system-level APIs instead (or as well).

This kind of cross-platform metadata support will be needed for Eclipse plugins and Eclipse Rich Client Platform (RCP) applications. It's not needed now and not even in two years. It will probably be needed in five years and definitely in ten years.

It doesn't hurt to work on it in advance to get the kinks out. I expect the pace to be slow (unless I get development help or funding to work on it full time) but I'll probably have metadata on my mind for my whole career. It seems like a good project to work on in my spare time.

posted at April 30, 2005 at 10:49 AM EST
last updated December 5-, 2005 at 02: 2 PM EST

»» permalink | comments (6)

# Bye Bye Hotmail

In the last few months I've slowly changed my main email address from Hotmail to Gmail. It's taken a few months but I think most of my contacts are sending email to the latter now.

By the way, if you would like an invitation for a Gmail account -- currently the only way to get a Gmail account -- then let me know. I have 50 invitations.

Where did Hotmail go wrong from my perspective?

1. I never liked Hotmail's newest interface. I found it too gimmicky and SLOW. Hotmail's old interface was much more useful and fast. Gmail's interface is simple and above all, functional.

2. Hotmail's junk mail protection doesn't work well so I resorted to only accepting emails from contacts. This means I have to keep checking the junk for emails from people not on my contacts. I also had to keep my contact list updated. Gmail's junk mail protection has been very good and I don't have to mess with contacts.

3. It look a long time for my Hotmail account to be upgraded from 2MB to 250MB. Longer than most people I talked to, actually. The order may have been random but I've had my Hotmail account for over 5 years. It would have been nice to have some sort of priority. My Gmail limit stands at 2137 MB and rising and I'm using 169MB of it.

4. Sorting sorting sorting. Hotmail's filtering pales in comparison to Gmail's labels.

5. Gmail's conversation thread handling is a killer feature as far as I'm concerned. Hotmail has nothing like it.

I still have to use my Hotmail email address for MSN Messenger but other than that, I'm outta there.

posted at April 27, 2005 at 03:31 PM EST
last updated December 5-, 2005 at 02: 2 PM EST

»» permalink | comments (14)

# Week 09 Status Report

This is the week 09 status report for Durham Metadata Framework for Eclipse, AudioMan and their subprojects.

What was done last week

I was bound to have a slow week eventually and I'd consider this one the first. You'll have to forgive me, I was a little distracted. No, I'm not going to tell you why.

I moved Durham and AudioMan to SourceForge.net and renamed all of the projects and Java packages to start with net.sf.durham. Even with the package refactoring support in Eclipse this was a lot of keystrokes for 20+ Eclipse projects. Took me a few hours. It did give me an opportunity to organize things better though, with Durham at the center like it should be.

A nice thing about SourceForge.net is that you can browse the CVS repository on the web with ViewCVS. Another good thing: it looks like SourceForge is going to support Subversion soon. SourceForge also has a build machine (several, actually) and I can schedule nightly builds on it. My build machine at home, a PII-266 named ferris, is off now. For a while there I thought I was living in a wind tunnel.

New things to do

  • Get the nightly builds going
  • Get CVSSpam or similar tool working on the SourgeForge CVS repository
  • Figure out how to mock test Durham's API

AudioMan is on hold indefinitely. It's function now is to be a test client application for Durham's API, just like Quick Editor. I won't be touching it again until Durham is caught up.

posted at April 25, 2005 at 09:00 AM EST
last updated December 5-, 2005 at 02: 2 PM EST

»» permalink | comments (2)

# Durham and AudioMan Moving to SourceForge

As I worked on AudioMan I realised that the metadata framework it used might be useful to other people so I separated it into a product called Durham. Now I'm going to call it the Durham Metadata Framework for Eclipse and make it the primary focus of my open source efforts.

Yes, AudioMan will still be a released product, it just won't be the main product. AudioMan will be an example metadata-using application that shows off what Durham can do. It will be open source so people can see how Durham can be used. AudioMan will be a client of Durham's API, helping me sympathize -- for lack of a better word -- with other developers using Durham.

I registered Durham with SourceForge.net and it was accepted. Over the next week I'll be renaming Java packages and reorganizing Durham, AudioMan and QuickEditor to fit into Durham's home on SourceForge. All three will be released from that one place at the same time.

I'm hoping that registering Durham with SourceForge will help me build more of an open source community around Durham and get it noticed.

This also affects when I choose to make a release. Before I made this switch I was releasing when AudioMan was done. Now the release date will all depend on Durham's stability. The stability and usability of AudioMan and Quick Editor are secondary.

When I make a release, AudioMan and Quick Editor will be less stable than Durham. This only makes sense from a testing standpoint. Durham will have three clients bashing on it -- the test suite, AudioMan and Quick Editor -- and from a testing perspective has an advantage over the two end-user products.

Indeed, I'm positioning Durham to be Yet Another Eclipse Framework. At least it doesn't have an ordinary TLA. I'm not going to make some official-looking Eclipse proposal document and then not follow through on it. I want to deliver real software. I'd like to get Durham out of the vapourware stage before I make any kind of proposal like that. When you don't have a big company behind you shipping working and stable software is a good way to be taken seriously. Maybe the only way. That is my goal ... and it's no secret.

I haven't decided what I'm going to do with the audioman.org domain name yet. I may just redirect it to Durham's SourceForge page.

Thanks to everyone for their support for AudioMan (and Durham) over the years! Yep, it's been years already.

posted at April 20, 2005 at 07:01 AM EST
last updated December 5-, 2005 at 05: 1 PM EST

»» permalink | comments (4)

# Week 08 Status Report

This is the week 08 status report for Durham Metadata Framework for Eclipse, AudioMan and their subprojects.

What was done last week

Whenever a file is added or removed from Durham it can notify classes I've called caches. Some of these caches, like the Hibernate/hsqldb cache I wrote last week, can be persistent and reloaded at startup time. Other caches, like the models of the MVC pattern used for the user interface, can be temporary and do not need to persist their data.

The whole discussion about the Operation class last week is kind of embarassing. I realised later that I unknowingly made it look very similar to Eclipse's Job class. D'oh! I changed all of the Operation subclasses to Job subclasses and now they show progress in the UI. It was a pretty easy refactor. <g>

The AudioMan UI is starting to settle down. I purposely did not tweak it early -- I consider early UI tweaking just as bad as early performance tweaking. Not only is it bad it's often a waste of time. When you're refactoring and moving things around sometimes the UI feels it the most. I'll get close to the target during development to get feedback but the UI will be tweaked as late as possible. You don't put icing on a cake before it has baked and cooled, right?

I also worked on some Ogg Vorbis support this week. It's in the early stages. On the development mailing list I invited other people to write metadata plugins, especially for audio formats (WMA,RA,FLAC,SHN,etc) for AudioMan. If you are interested in doing that let me know and I'll set you up.

Another interesting metadata plugin for Durham that one of you could make: Java source files (*.java). You could pick out metadata like number of classes, methods, member variables, lines of code, etc. If you use some of the Eclipse JDT plugins this may not be too difficult. You may also want to make a source code content category.

New things to do

  • Use the new build machine for CVS and nightly builds -- pending: there could be news here
  • Mac: package products in a dmg'd bundle
  • Support for adding backup discs to the collection
  • Start work on the Backup Management Perspective
posted at April 18, 2005 at 07:51 AM EST
last updated December 5-, 2005 at 02: 2 PM EST

»» permalink | comments (0)

# Persisting Just the Filename

I had an idea this morning: I can begin development of AudioMan collection persistence by only saving filenames to the database. When the application starts the filenames are loaded from the database one at a time and their metadata is reread from the file into an object (AllMetadata). That object is then used in the model (see MVC) for collection browsing and editing in the UI. When the user adds a file to the collection that file would also be added to the database -- and removed from the database when removed from the collection.

At this point the database does not keep track of metadata property values, so they have to be reread at application startup/load/initialization time. Only saving filenames allows me to get Hibernate and hsqldb running with a minimum amount of data, which will be easier to integrate into the existing application.

The next step would be saving all of the metadata property values in the database. There are a few reasons for persisting that much data:

  • At startup time only those files that have "stale" metadata would need to be reread from their files, so startup would be faster.
  • With a database I can do neat metadata property searching features like find-as-you-type
  • I need to save metadata property values for backup media because after the media is scanned into AudioMan I probably won't be able to reread it.

Database persistence would be eager in that it would save new metadata to the database as soon as it was read from the file by the Durham Metadata Framework for Eclipse. I won't have to do any saving to the database when the application is closed -- I'll just have to close the connection to the database. If the application crashes the user won't lose changes they made to their collection in that session.

Side note: why do I always get good ideas when I'm not sitting in front of a computer? Things pop into my head in the car, in the shower, on walks, exercising ... maybe I need to be AFK more often.

I guess that just goes to show me though -- sometimes people can do their best thinking far far away from work, when they are relaxed and actually have time to THINK! My memory is terrible so I've gotten into the habit of carrying note paper and a pen around with me ... I forget ideas almost as quickly as I think them up. Ha.

posted at April 16, 2005 at 06:56 AM EST
last updated December 5-, 2005 at 02: 2 PM EST

»» permalink | comments (6)

# Backup Management Perspective

After I get collection persistence into AudioMan I'll be working on the Backup Management perspective. This post will examine my own user story concerning backups. Feel free to add your own requirements in the comments.

I have a lot of albums that I "ripped" from CD, encoded to MP3 and backed up on CD-R. I can't fit all of these MP3s on my hard drives and even if I could I'd want them backed up anyway in case of a hard drive failure.

I have the discs numbered but when I want to find something I have to scan this page for it. Worse, that page is a year or two out of date and so are my backups. The larger my collection gets, the harder it is to keep track of.

Instead I'd like to be able to search and browse my backup discs in AudioMan's collection browsing perspective. Then I'd like another perspective to tell me what I have and haven't backed up yet -- which is the aim of the Backup Management perspective.

Backup metadata is often read-only because of the media it is stored on. Sometimes this read-only data is incorrect so I'd like to be able to change the metadata after I add the file to my collection. However, I still need to keep the original read-only metadata around so I know what it is. The Backup Management perspective should show me what a backup file's original metadata is and if I've made any changes to it locally.

I'm going to analyze each file's audio and generate a unique key. One way to do this is with technology like MusicBrainz's TRM. I'll be able to match files that have the same audio and show which files on the local hard drive have corresponding backups.

I'll also be able to make a list of files that don't have backup copies yet. From that list I'd like to take a subset of files and import the paths (ie. drag and drop) into a CD burning program. Then I'll burn the CD and scan it back into AudioMan as a backup disc. Now those songs will show as backed up and I'll burn the next disc until the whole collection on the local hard drive is backed up.

posted at April 15, 2005 at 07:06 AM EST
last updated December 5-, 2005 at 02: 2 PM EST

»» permalink | comments (7)

# Starting Persistence

Now is the time I start working on AudioMan collection persistence. That involves remembering the collection the user has put together and reloading it the next time they open AudioMan.

In the first version of AudioMan (version zero, I guess) this was done by serializing the collection as XML and saving it to a file when the AudioMan window was closed. Then the file would be loaded on startup. It was simple but it failed to cover the case where the collection was modified and then AudioMan crashed -- you'd lose all of your modifications. AudioMan didn't crash very often but this error case was brutal.

In AudioMan1 I used hsqldb and a unit-tested database API. This worked alright and covered the error cases where AudioMan crashed before exit. The real pain was maintaining all of that SQL ... oh, and it was slow. Very slow for large collections ... and I have a large music collection. But the real bonus of using a database is that I could query it. With that querying power I implemented a find-as-you-type feature like iTunes.

For AudioMan2 I'm leaning towards hsqldb again but mostly for 1) ease of (non-)installation by the user and 2) because it is written in Java and so is automagically cross-platform with one deployed JAR. I'll reserve opinions on its speed until it's actually running in my Real World situation.

To complement hsqldb this time I'll be trying out Hibernate. Hibernate does something called object-relational mapping (ORM) which is a fancy way of saying it maps Java classes/objects to database schemas/records.

I'm hoping using Hibernate will have a few benefits: 1) I can use other (faster) databases (like MySQL) easily and without porting SQL 2) less manual coding doing synchronizing between classes and database schema, which speeds up development 3) less testing effort because there is less code on my end to test 4) forced seperation of business logic and database-related Java/XML.

Durham will manage the database and use it as a cache so it's not hitting the files to get their metadata every single time, especially when the user is just browsing around the collection.

AudioMan/Durham will also use the database to store information about backup files. Backup files are stored on discs like CD-Rs, not on the "local" hard drive. Backup metadata must be read from the backup disc and stored in the database so it can be accessed at any time.

I'll explain more about how I see AudioMan, Durham and Hibernate interacting in a later post.

posted at April 13, 2005 at 09:51 PM EST
last updated December 5-, 2005 at 02: 2 PM EST

»» permalink | comments (2)

# Transfer Explosion

I was surprised to learn that in the month of March 2005 my ryanlowe.ca website alone chewed through 2 gigabytes of bandwidth (bytes sent to clients). It only has my blog and my resume on it! Since I only pay for up to 1 gigabyte this comes right out of my own pocket. That kind of math motivates me to find ways to trim the fat, so to speak. After looking at the transfer logs it seems there wasn't one single problem but a few...

The biggest problem seems to be bots, not people. There are search engine bots, yes, but there are also blog comment spamming bots that search through blogs. These bots use up my transfer quota mindlessly downloading and inspecting every page on my site. I really really really don't want to have to disable comments and trackbacks, even on my really old posts. But I just might have to do that in order to get rid of these comment spammers.

Another problem is images, including the "headshot" image at the top right corner of the main page of this blog. I'm debating whether or not I should just get rid of it but the headshot comes from a longstanding journalism tradition indicating that this is my opinion, not necessarily fact, and I like that. Screenshots on the main page were another offender. Overall, I should keep the size of images in mind when I post them.

A third problem are the RSS feeds, which are downloaded often by RSS clients (ie. hourly even if nothing has changed). I put whole posts in my feed so naturally they can get quite long (the fact that I'm a windbag aside). I know many of my readers like full posts in the RSS feed so I will probably just reduce the number of posts in the feed instead.

I wonder if the web server I'm hosted on supports compression? I know that some browsers support it so they can receive compressed data if the server supports it. Plain text (like HTML, RSS feeds) compresses very well.

Any other suggestions for reducing my transfer numbers? I'd like to get back under 1 gigabyte again.

posted at April 11, 2005 at 01:26 PM EST
last updated December 5-, 2005 at 02: 2 PM EST

»» permalink | comments (11)

# Week 07 Status Report

This is the week 07 status report for AudioMan and its subprojects.

What was done last week

When you're dealing with metadata you're dealing with files, so you need files for unit tests. For the last two years AudioMan has had code that gets a file from a test directory, copies it to the temp directory and then returns the temp file for testing so the original test file isn't modified. It's very handy but it wasn't made for plug-ins (and the boundaries they enforce). This week I rewrote the code so that any plug-in's tests could grab a temporary test file.

I made up an abstract class called Operation for application-level logic code that is squeezed between UI and Durham. Like JFace's Action class, subclasses of operation implement the run() method to specify what the operation does. Then like a Thread object the UI code starts the operation with the start() method, which calls the operation's (protected) run() method in a new thread. The threading code is in the Operation abstract superclass and can be changed without affecting the many concrete operation subclasses. The next step will be to integrate Operation into RCP's Job support and show progress in the UI for all of the currently running operations.

A lot of progress was made on the Collection Browsing perspective. Users can include files or whole directories (recursively) into their collection, browse the collection and edit file metadata. AudioMan2 is quickly catching up to where AudioMan1 left off. I took a vacation from work last week so I had far more time to work on AudioMan than usual and progress was substantial.

I'm debating with myself whether to release AudioMan soon. True, it's functional and people could edit tags and browse with it but it's not doing anything revolutionary. People can do that in iTunes. I think I'll wait until AudioMan does something special before I take the time to release it.

I know this goes against the release often mantra but on a one person project releasing software is time expensive. It's not time expensive to generate a release package -- that takes 90 seconds. Tweaking the quality level up to my release standards takes a lot of time, a problem I had with AudioMan1. I spent a lot of my time on AudioMan1 preparing releases that few people used, just to satisfy release often. Lesson learned; I think I'll wait a bit longer.

If you really want to see what AudioMan2 is doing you can check out the source code. I try my best to keep the repository stable but I wouldn't call it release quality.


I implemented the "audio content category" I talked about last week which is a collection of audio content type definitions. Right now I've only defined the MP3 content type but when I add support for other audio content types (OGG, WMA, FLAC, SHN, etc) I'll just add them to the audio content category. Then they will just work in AudioMan (browsing, editing, etc) with no other changes necessary since AudioMan deals with the audio content category and not specific content types. I'm glad it worked out -- it made the AudioMan code much simpler.

It's also not that difficult for third parties to create their own content types and content categories. Durham is shaping up to offer good metadata support for Eclipse and Rich Client Platform (RCP) applications and AudioMan is taking full advantage of it.

AudioMan is also taking advantage of the RCP itself. Construction of AudioMan's GUI was far simpler this time around, thanks to the pre-constructed workbench and views the RCP offered. The amount of UI code I had to write decreased, which reduces the manual testing burden. Music to any PM's ears.

New things to do

  • Use the new build machine for CVS and nightly builds
  • Mac: package products in a dmg'd bundle
  • Support for adding backup discs to the collection
  • Start work on the Backup Management Perspective
  • Investigate collection persistence between sessions: Hibernate.
posted at April 11, 2005 at 06:29 AM EST
last updated December 5-, 2005 at 02: 2 PM EST

»» permalink | comments (0)

# Week 06 Status Report

This is the week 06 status report for AudioMan and its subprojects.

What was done last week

I continued to polish the AudioMan and Quick Editor product packages: The Windows EXEs now have custom icons and names. The Durham Quick Editor has a logo, splash screen and help screen branded. Eventually I'll get a graphic designer to do a decent job on that stuff but the placeholders will work for now.

The AudioMan build scripts make consolidated reports for JUnit and EMMA across all AudioMan projects to give me overall metric totals. Some of those stats are on the left margin of the main page of this blog.

The Metadata core classes are shaping up nicely, just in time for them to be changed again to accomodate what I'm calling "content categories". For example for AudioMan I'll have the "audio" content category and I'll use this to grab audio attributes (which map to metadata properties) out of content types in that category.

All of the AudioMan UI elements will know they are working with content types in the audio content category, and can request an Artist Name property value from the content type whether its an MP3, OGG or WMA file. Of course I also want to allow people to define their own content categories for Durham just like they can create support for their own content types and metadata. Some ideas for other content categories: images, video, source code, office documents, etc...

If you aren't sufficiently confused about Durham's contentType/metadata/property object model yet, I'll write a blog post about it this week. If you're feeling really proactive you could just read the source code. :)


Speaking of content types, there's some interesting work going on with content types in Eclipse 3.1 to determine file types by the content of the file rather than file extension. From a quick browse of the API it seems like they are looking for specific binary patterns in the files to determine content type. This is very close to file metadata -- a step or two away maybe.

Unfortunately I don't think I will be able to use any of the classes because most of the interesting ones are in internal packages. Maybe I can still hook into the content type support to match content types to metadata types. Right now I'm using only file extensions and for some file types that's less than optimal. Audio doesn't really have that problem but video and images do.

What got bumped?

  • Email results of the build to the releng mailing list -- figure out why Ant's <mail> task isn't working. I'll wait for the new build machine to be set up first. Update: Still waiting for my host to get set up.

Unsolved Problems

  • If I bundle my RCP product into a Mac OS X .app directory can it still be updated through the Update Manager? I'm thinking no. I may not be updating Quick Editor or AudioMan via update manager anyway.

New things to do

  • Use the new build machine for CVS and nightly builds
  • Mac: package products in a dmg'd bundle
  • Continue work on AudioMan's Collection Browsing perspective.
  • Centralize the code that gets temporary copies of test files for unit tests.
posted at April 04, 2005 at 12:15 PM EST
last updated December 5-, 2005 at 02: 2 PM EST

»» permalink | comments (0)

# Smaller Companies, More Hats

Joel Spolsky wrote a series of articles about the release of the next version of FogBugz. In Part IV he talks about the "ridiculously small portion" of developer work that is actually writing code.

Eric Sink expands on that point by advising small companies to hire developers, not programmers. Developers by his definition are more well rounded and can do the many tasks necessary in a small company. Very good programmers are just very good programmers.

It seems like the smaller the company the more hats you have to wear. Whereas in bigger companies you might have a few very specific tasks on your plate and the rest is taken care of by other people -- people who are probably specialized at completing those tasks.

posted at April 01, 2005 at 01:59 PM EST
last updated December 5-, 2005 at 02: 2 PM EST

»» permalink | comments (2)

Search scope: Web ryanlowe.ca