« February 2006 April 2006 »
blog header image
# Searching for Context

What is a search engine? They are relatively dumb, aren't they? You search for a word and the search engine tells you which pages that word is on. The secret (for now) seems to be the ordering. Which pages are most relevant for the word(s)?

It's hard to do this if the server doesn't understand context of the Internet's pages and the user's search. What makes the problem harder is that users have been conditioned to use dumb search engines with dumb queries, removing important context from their searches.

Google innovated by counting links to a page (PageRank) to improve search results ordering. But does this really improve results? It sounds more like a popularity contest than improving relevance but it was still better. No, the next step in search engines will involve context: understanding the web, not just indexing keywords and calculating PageRank.


Google has already started to do this with its music search. When it parses pages from Amazon or other online stores, it sorts the page's data out and gives it context. It's easy to find an album title on Amazon because it's always in the same place, with the same HTML tags on it.

The Google Music Search can be a little more intelligent when parsing a web page from a music store because it knows the page's structure and can deduce context from that structure.

The strange thing about this kind of search engine crawling is that it seems to violate many web site's "Terms of Service" agreements. In a world where duplication is free, original content is key. So websites try to protect their content by legally restricting what people (or machines) can do with it.

For example, Google's Terms of Service say you can't re-use their search results for commercial purposes. But that's exactly what Google is going with other website's data: Google crawls a website, stores the entire site on Google's servers, indexes the keywords and calculates PageRanks for every page.

The difference is that Google's "service" is the ordering of the results, not data. Google's search engine doesn't have any original content.

Why are Amazon and the other music stores allowing Google to repurpose their music data? Because it's driving consumers to their site, even though technically the data re-use may be a violation of their Terms of Service. It's an interesting implicit agreement (or explicit, who knows): if you drive relevant traffic to us, you can re-use our public data.

Search engines are going to try to do this more: take a website's data and understand it. In the process they will be *using* this data, data that isn't theirs -- it's just floating around in the open, bound by loose laws and implicit agreements to play nice.

For example, an online store could simply reject access to the GoogleBot user agent if they don't like what Google is doing with their data. The downside is that the site wouldn't appear on Google any more. The upside is that their data might seem to be "protected".

Google can throw their weight around like this because they have influence ... but what if a smaller website tried to do the same thing? Would Amazon take kindly to it? We'll soon see: the search engine that beats Google is going to do it with context.

posted at March 31, 2006 at 10:06 AM EST
last updated December 4-, 2006 at 10: 1 AM EST

»» permalink | comments (3)

# New Direction

No, I'm not abandoning FanConcert -- far from it. I'm stuck in a rut lately, so I'm going to try something new for a bit.

I've decided to start using my iBook for Rails. That's not that shocking -- a lot of Rails developers use Macs. I've decided not to use TextMate just yet, to speed up the learning curve. Instead I'm using the same software I used on Windows: Eclipse with the RDT and Subclipse plugins. It's working very well -- Eclipse is a lot faster than it used to be on my Mac.

Eclipse has made my life a lot easier while I learn how to deal with other new things, like MySQL configuration on the Mac and using the command line more, especially with rake. Everything is going very well.

I've started to use Migrations to change the database schema and wow, what a difference. I'm loving it already. I want to start using Capistrano (formerly SwitchTower) for server deployment, put Rails in my project's vendor directory like it's supposed to be and dig more into the Rails productivity tools that can help me streamline my development process. I also want to get more familiar with the new features arriving with Rails 1.1.

This changing of gears has really helped me get out of my little rut. Using a new project to try out these new things reduces the complexity and eliminates the risk of breaking FanConcert for long periods of time. When I get used to using the new tools on the new project, I'll switch FanConcert over as well.

So what's the new project? I'm taking the lessons I've learned from FanConcert's moderation system and I'm making a general object moderation-based website that can be configured for any types of objects. The short answer: I'm trying to make a wiki killer.

The beta test? FanConcert's objects. If everything goes well this new tool will be the base for editing FanConcert's objects, while prettier read-only pages ride on top of it for people to browse through. Instead of continuing to work from the top down, I'm going to try from the bottom up for a little while.

I should have something posted soon on a new domain.

posted at March 22, 2006 at 01:31 PM EST
last updated December 3-, 2006 at 19: 1 PM EST

»» permalink | comments (2)

# Local Graphic Designers?

I'd like to meet some graphic designers in Ottawa -- or at least southern Ontario -- and I have no idea where to begin.

I was thinking of signing up for a course in basic graphic design at Algonquin College, just to get basic appreciation down. I'd also be able to find out where the designers are in this town.

Why do I want to meet graphic designers? Because graphic designers and programmers complement each other well, especially when they can appreciate each others' skills. Get a business/marketing person in there and you have a small business.

Do they have to be local? Not necessarily, but I find it easier (and faster) to talk to people face to face.

posted at March 22, 2006 at 01:14 PM EST
last updated December 3-, 2006 at 19: 1 PM EST

»» permalink | comments (0)

# REST Here for a While

Last night I read DHH's post on The Accept Header and it got me reading about REST and the Atom Publishing Protocol (atompub).

REST sounds complicated and/or intimidating until you read a lucid explanation of it, of which there are few. It turns out it's not that hard.

It's great that Ruby on Rails is making it easier to be RESTful in the next version (1.1), so it'll be easier to make FanConcert RESTful.

How does being RESTful help FanConcert? Besides making it easier for people and search engines to find resources on the site, REST turns out to be a good way to make an API.

FanConcert can use a RESTful API to receive information from applications that aren't web browsers. At the same time, it can use the API to operate as a web service -- an information source for other web sites or end-user applications.

If FanConcert can accept information from an API, people can write their own software to automatically stuff FanConcert with information. Sometimes these programs are called bots because they can work autonomously, crawling around an information source (like a website or document), processing the information and then sending the information someplace else.

I'd like to encourage this type of development with FanConcert because people like hackable software. But I can't leave the read end of the API wide open or it could end up costing me a lot of money. There are other implications of using bots on social sites like FanConcert: Should they be explicitly pointed out? How will their 'reputation' be calculated if they can submit much more information? Will they be easy to correct if they make a mistake, like a parsing error?

There are a lot of interesting implications but it's been done before. Wikipedia uses bots to maintain some of it's geographic articles (like city data) and probably other pages. It would be a mistake to ignore that precedent.

I'll have to figure out a reasonable solution for people that need to use the read end of the FanConcert API a lot, maybe a CDDB-like business model to pay for the bandwidth. Regular users should still be able to use the read part of the API as long as they don't go above a certain limit.

posted at March 18, 2006 at 08:02 AM EST
last updated December 3-, 2006 at 19: 1 PM EST

»» permalink | comments (1)

# Streamlining the Environment

Reality check time: FanConcert can't continue to be a full time job. It's just not paying off right away. I still want to continue working on it part time because I think the idea is good but it just needs more work.

If I'm going to continue to work on it part time, the hardest thing will be jumping into it with days off in between. It takes time to get back up to speed, remind yourself where you were and add something new or fix a problem.

Andrew does some great work like that though. He seems to be able to split the project into small enough chunks that when he has a spare half hour (like on the bus) he can jump into that little task. That really helps because then you can truly attack the project in little increments, whenever you have time.

An organized development environment really helps to pull that kind of thing off. You don't want to have to worry about little things every time you jump into the project -- you just want to add something, make sure it doesn't break anything else and maybe deploy it.

A test suite helps tremendously with that, as do development tools that are easy to use and get out of your way. If you can't remember vi commands because you don't use it a lot, use something simpler. Run your project's entire test suite with one step (done) and deploy your project with one step (not done yet, I'd like to use Capistrano).

So I'm going to take a break from new development for a few days and look into the Ruby on Rails environment and some tools that can help me streamline my process. Lucky for me, the Rails community is full of pragmatists that have these same goals in mind. A lot of these types of tools already exist.

It'll take a little time to learn them now but they could save lots of hours in the long run.

posted at March 18, 2006 at 02:35 AM EST
last updated December 3-, 2006 at 19: 1 PM EST

»» permalink | comments (0)

# Venue Status

It's true that FanConcert's main purpose is to keep music fans updated with new things. But new things become old things eventually and FanConcert will end up with a history of music events.

So FanConcert's objects have to make sense over the long term. One of those sensitive areas is venue names. Venues change names all of the time but a concert that took place at a venue 10 years ago should be connected to a venue with the name from 10 years ago. That way the history will make sense.

To handle these kinds of situations I've added a status attribute to venues. Venues can be Open, Closed or Renamed. If you choose Renamed, you can point to another venue in FanConcert and give a date when the rename occurred. An example: FanConcert's page for the formerly named Corel Centre, which points to its new name Scotiabank Place.

I also want to show these renames in the venue 'chooser' that's used on the new concert page. When you're searching for and selecting a venue for a concert you should be able to see if that venue is still open! On the other hand you may be well aware that the venue closed four years ago and you're entering a concert history you found online. It should still be possible to add past dates to a closed venue.

A missing attribute is the venue open date. If I had a spot for that, I could ensure that all of the concerts entered for that venue take place between the open and close dates, to prevent accidental errors. At the very least I could provide a warning when people try to add concerts to that venue outside of that range.


Artist's also have status: Active, Broken Up, etc. But the chronology can be more complicated. People can move from band to band, forming a sort of family tree. I may add a simple Artist status soon but in the long term it will probably be much more complicated.


posted at March 03, 2006 at 12:09 AM EST
last updated December 3-, 2006 at 06: 0 PM EST

»» permalink | comments (0)

Search scope: Web ryanlowe.ca