January 24, 2007

A couple of weeks back Jeffrey Zeldman posted about a gas leak in New York. This sentence jumped out at me:

We knew that the smell was not natural gas but mercaptan, a chemical that is injected into natural gas to let people know when there’s a leak.

I’d never heard of mercaptan (or Methanethiol) before, but of course it makes sense given that natural gas has no smell of its own.

What do we “inject” into software to let us know when we have a problem?

Logging errors is pretty common. At Trade Me we record the details of every unexpected exception in the application and database, including the page details and query string, stack trace, error details, member details, user agent, etc. We have SMS alerts setup so that when clusters of errors occur we are automatically notified. We monitor the error logs pretty closely whenever we deploy changes to the site to make sure that we haven’t introduced any new errors. This works pretty well.

But, a system like this only tells us when hard errors occur.

What about less obvious things, like design related problems which don’t result in hard errors?

How about validation errors? Most developers would argue that validation errors are the users problem – where required fields are not completed, or invalid data is entered etc. However, if the same validation error is occurring repeatedly then it is a pretty good indication of problems with the design of the form. And, getting visibility to that would provide useful feedback to the people responsible for designing the form.

How about the amount of time spent viewing each page? Suppose we could measure the time between page requests as a user works through a given process (e.g. the sell process on Trade Me). If users are spending a long amount of time on one particular page that could be an indication that the design is not clear. Likewise if users consistently abandon the process at the same point.

How about commonly used help links or help search terms? On Trade Me the #1 viewed help topic (by quite a margin) is the page which explains auto-bids. What are users stuck on when they click the help link? Can we use this information to better explain the parts of the application that users are actually struggling with.

How about client-side HTML validation errors and JavaScript errors?

I’m sure there are lots of others.

Has anybody tried to implement this sort of soft error logging? I’d be interested to hear if it worked.