Mercaptan

A couple of weeks back Jeffrey Zeldman posted about a gas leak in New York. This sentence jumped out at me:

We knew that the smell was not natural gas but mercaptan, a chemical that is injected into natural gas to let people know when there’s a leak.

I’d never heard of mercaptan (or Methanethiol) before, but of course it makes sense given that natural gas has no smell of its own.

What do we “inject” into software to let us know when we have a problem?

Logging errors is pretty common. At Trade Me we record the details of every unexpected exception in the application and database, including the page details and query string, stack trace, error details, member details, user agent, etc. We have SMS alerts setup so that when clusters of errors occur we are automatically notified. We monitor the error logs pretty closely whenever we deploy changes to the site to make sure that we haven’t introduced any new errors. This works pretty well.

But, a system like this only tells us when hard errors occur.

What about less obvious things, like design related problems which don’t result in hard errors?

How about validation errors? Most developers would argue that validation errors are the users problem – where required fields are not completed, or invalid data is entered etc. However, if the same validation error is occurring repeatedly then it is a pretty good indication of problems with the design of the form. And, getting visibility to that would provide useful feedback to the people responsible for designing the form.

How about the amount of time spent viewing each page? Suppose we could measure the time between page requests as a user works through a given process (e.g. the sell process on Trade Me). If users are spending a long amount of time on one particular page that could be an indication that the design is not clear. Likewise if users consistently abandon the process at the same point.

How about commonly used help links or help search terms? On Trade Me the #1 viewed help topic (by quite a margin) is the page which explains auto-bids. What are users stuck on when they click the help link? Can we use this information to better explain the parts of the application that users are actually struggling with.

How about client-side HTML validation errors and JavaScript errors?

I’m sure there are lots of others.

Has anybody tried to implement this sort of soft error logging? I’d be interested to hear if it worked.

Doug Bowman & first posts

A few years back Doug Bowman’s StopDesign.com was one of the first blogs that I subscribed to (once I figured out what RSS was!) Having a chance to meet and spend some time with Doug when he was in Wellington for Webstock was definitely one of the highlights of last year for me.

So, it’s great to see him vowing to write more frequently again. He points at his very first post from 2002. A lot of what he wrote then resonates with me, having just started out on this myself.

This log of thoughts is mostly for my own record, but if you’re along for the ride, welcome.

Inspired by this, I decided to try and track down other first posts from some other favourite long-time bloggers. Here are some links, in chronological order:

It’s interesting how many of these posts are from December/January – obviously a popular time of year to be making a start.

Where the hell is Matt?

Mostly because I want an excuse to try embedding a YouTube video in a post, but also because I think this is a great story.

http://www.youtube.com/watch?v=bNF_P281Uu4

With apologies to anybody who came along to my presentation at TechEd and already heard me talk about this …

This is a great example of the power of word-of-mouth marketing and something that could have only happened online.

The video above is actually the sequel. The first video (Windows, 8MB or Quicktime, 13MB) was cobbled together on a trip he paid for himself and wrote about on a travel blog for family and friends and anybody else who stumbled upon it. It turned out that quite a few people did, and they told their friends. Somebody told the kind people at Stride Gum who asked him if he wanted to do it all again, only better. This time he would go to 39 different countries and in the process visit all 7 continents. That’s not the kind of offer you turn down I suppose!

The full story is on his site, www.wherethehellismatt.com, including the blog he wrote as he travelled to all of those amazing places and danced.

<trivia why=”to scare those of us who are nervous of heights”>

The rock in Kjeragbolten, Norway, that he dances on, albeit a little less enthusiastically than in some other places, is even worse than it looks. It is a straight drop of 1000m with nothing to block your fall. More info from Wikipedia and from Matt’s journal entry.

</trivia>

If you have a big-screen TV hooked up to a computer then download the hi-res version (Windows, 48MB or Quicktime, 48MB) and turn the volume up and try to stop yourself from dancing along with him.

Enjoy!

UPDATE (22-Jan-07): To try and fix embedded YouTube video.

UPDATE (30-Jun-08): Where the hell is Matt? (2008)

ASP.NET 2.0

Last week we deployed Trade Me as an ASP.NET 2.0 application. We switched over early on Tuesday morning without even taking the site offline. With luck, nobody noticed. Nonetheless, this is an exciting milestone.

Eighteen months ago all four sites (Trade Me, FindSomeone, Old Friends & SafeTrader) were built using classic ASP, which was starting to show its age. We’ve been working off-and-on since then to bring this code base up-to-date. Most of the heavy lifting was actually done this time last year, when we took the opportunity over the quiet Christmas/New Year period to make a start on the main Trade Me site – taking it from ASP to ASP.NET 1.1.

The opportunity to work on this project was a big part of my motivation for heading home from the UK in 2004. It’s great to reach a point where we can reflect on the huge benefits it has realised, not the least being that we’ve been able to complete this work on our own terms. It’s an awesome credit to the team of people who have made it happen.

Our motivation

I’m pretty proud of the approach we’ve taken. To understand this you really need to understand the motivation for the change in the first place.

In 2004 there were a number of unanswered questions:

How much further could we push ASP before performance was impacted?

Back then, we were actually pretty happy with the performance of our ASP code. It had been tweaked and tuned a lot over the years. We’d ended up building our own ASP versions of a number of the technologies included in ASP.NET, such as caching.

The interaction between ASP and the database, which was (and is!) at the heart of the performance of the site, was pretty carefully managed. For example, we were careful not to keep connections open any longer than absolutely required, etc, etc.

At the application layer we had managed growth by adding more web servers. But, this was something we could only push so far before it would start to create problems for us in other places, most importantly in the database.

While we had confidence that we could continue to work with ASP, that wasn’t necessarily shared by everybody else.

Which lead us to the next problem …

How could we continue to attract the best developers to work with us?

It’s hard to believe now that we managed as well as we did without many of the tools and language features that we now take for granted: compiled code, a debugger, a solution which groups together all of the various bits of code, source control to hold this all together, an automated build/deploy process, … the list goes on.

For obvious reasons, we were finding it increasingly difficult to get top developers excited about working with us on an old ASP application.

And there was lots of work to do. As always seems to be the case, there was a seemingly infinite list of functional changes we wanted to make to the site.

So, that left us with the question that had been the major stumbling block to addressing these problems earlier …

How could we make this change without disrupting the vital on-going development of the site?

Looking at the code we had, it was hard to get excited about improving it, and hard to even know where to start. There was a massive temptation to throw it all out and start again.

But, inspired by Joel Spolsky and the ideas he wrote about in Things you should never do, Part I we decided to take the exact opposite approach.

Rebuild the ship at sea

Rather than re-write code we chose to migrate it, one page at a time, one line at a time.

This meant that all of the special cases which had been hacked and patched into the code over the years (which Joel calls “hairy” code) were also migrated, saving us a lot of hassle in re-learning those lessons.

The downside was that we weren’t able to fix all of the places where the design of the existing code was a bit “clunky” (to use a well understood technical term!) We had to satisfy ourselves in those cases with “better rather than perfect”. As it turned out, none of these really hurt us, and in fact we’ve been able to address many of them already. Once the code was migrated we found ourselves in a much stronger position to fix them with confidence.

Because we had so much existing VBScript code we migrated to VB.NET rather than C# or Java or Ruby. This minimised the amount of code change required (we enforce explicit and strict typing in the compiler, so there was a fair amount of work to do to get some of the code up to those standards, but that would have been required in any case).

We kept the migration work separate from the on-going site work. When migrating we didn’t add new features and we didn’t make database changes. When we were working on site changes we made them to the existing code, leaving it as ASP if necessary, rather than trying to migrate the code at the same time.

We focussed on specific things that we could clean-up in the code as part of the migration process. For example, we added an XHTML DOCTYPE to all pages and fixed any validation errors this highlighted. We moved all database code into separate classes. And, we created controls for common UI elements (in most cases replacing existing ASP include files). We also removed any code which was no longer being used, including entire “deadwood” pages which were no longer referenced.

To build confidence in this approach we started with our smaller sites: first SafeTrader and Old Friends followed by FindSomeone then finally Trade Me.

After each site was migrated we updated our plans based on what we’d learnt. The idea was to try and “learn early” where possible. For example, after the Old Friends migration we realised we would need a better solution for managing session data between ASP and ASP.NET, so we used the FindSomeone migration as a test of the solution we eventually used with Trade Me. The performance testing we did as part of the early migrations gave us confidence when it came time to migrate the larger sites.

We re-estimated as we went. By keeping track of how long it was taking to migrate each page we got accurate metrics which we fed into future estimates.

Finally, we created a bunch of new tools to support our changing dev process. For example, we created a simple tool we call “Release Manager” which hooks into our source control and is used to combine various code changes into packages which can then be deployed independently of our test and production environments. We created an automated process, using Nant, which manages our build and deploy, including associated database changes. More recently we implemented automated code quality tests and reports using FxCop, Nunit and Ncover. All of these mean that, for the first time, we can work on keeping the application itself in good shape as we implement new features.

The results

This has been an exciting transformation. The migration was completed in good time, without major impact on the on-going development of the site – we made it look easy! We added four new developers to the team, all with prior .NET experience, and we got all of our existing dev team members involved in the project, giving them an opportunity to learn in the process. Having more people, along with process improvements and better tools, has enabled us to complete a lot more site improvements. We’re in good shape to tackle even more in the year ahead. We’ve even been pleasantly surprised by the positive impact on our platform, which has allowed us to reduce the number of web servers we use (there are more details in the Microsoft Case Study from mid last year, if you’re interested in this stuff).

As is the nature of this sort of change, we’ll never really finish. With the migration completed we’ve started to think about the next logical set of improvements. It will be exciting to see how much better we can make it.

If you’re interested in being part of this story, we’re always keen to hear from enthusiastic .NET developers. Send your CV to careers@trademe.co.nz.

Collective code ownership

Speaking of rotation … what happens when the same principle is applied to other types of teams? For example, software development teams.

You could argue that we use a rotation policy of sorts within the dev team at Trade Me in that we tend to mix up the projects a bit so that over time everybody works on different parts of the site.

This is a form of collective code ownership, which is not a new idea.

The main benefits are anybody is able to make changes to any part of the application, without fear of stepping on others toes; no individual becomes a bottleneck when changes are required; and the team is resilient to changes in roles and personal (this was also the justification used by Graham Henry for his rotation policy, pointing at the impact injuries to key players had in previous World Cup campaigns).

But what are the associated costs?

As Stefan Reitshamer points out, there is a pretty fine line between everybody owning the code and nobody owning the code. Instead of maximising flexibility and code quality as intended it becomes a tragedy of the commons.

I’m not sure there is a simple answer to this problem. However, unlike Messrs Henry and Bracewell, it’s nice to be able to work through these trade-offs without the media scrutinising every decision.

Rotation

Kiwi cricket coach John Bracewell is taking a bit of stick for his new rotation policy.

This is Richard Boock from the Herald last week (well before the embrassing result in Auckland over the weekend):

“No one in their right mind will take seriously a man who employs a rotation policy despite not having 11 full-strength players, and at a time when New Zealand cricket has roughly the same depth as a toddler’s swimming pool.”

Nice analogy!

Throughout the last All Black season Graham Henry was similarly criticised for his selections. The results all went his way though and he now has the luxury of selecting the team for the upcoming World Cup from a large pool of proven experienced players. So, with hindsight it’s hard to argue the critics were justified.

Time will tell if history treats Braces as kindly.

I imagine that the players themselves have mixed feelings about being rotated, although they don’t express it in public. For the stars of the team, who will be automatic selections when the crunch time comes, it’s probably nice to get a chance to relax. But for the players battling for a place in the team it must suck to sit and watch while somebody else gets a chance to impress in their position.

And, these are all competitive people who don’t like to lose, individually or as a team.

Must be frustrating …

Unavailable

As Juha, and no doubt many others, has noticed Trade Me is currently unavailable. The guys are working as I type to get things fixed. Hopefully it will be back up soon.

Not good.

:-(

UPDATE (04-Jan): It’s back up. The site was down between 8:50pm last night and 6:10am this morning. All auctions that were due to end during that time have been extended by 24 hours. There is more information in the site announcement. As the announcement says we sincerely apologise to everybody affected.

Trade Me browser stats for December

In December our three sites (Trade Me, FindSomeone, Old Friends) combined received just over 66% of all domestic page views recorded by Neilsen//NetRatings.

So, our server stats are probably the closet thing there is to a census of the technology kiwis are using to access websites.

Browsers

Browser Market share Dec-06 +/- since July-06
IE6 70.3% -12.3%
IE7 12.2% +10.8%
Firefox 1.5 7.1% -0.3%
Firefox 2.0 3.8% +3.8%
Firefox 1.0 1.9% -1.2%
Safari 1.6% +0.5%
IE5.x 1.4% -0.7%
All others 1.7% -0.6%

There has been a lot of change in the last couple of months following the release of IE7 and Firefox 2.0, which together now account for over 15% of our visitors.

Despite all of the good press that Firefox gets within the web development community IE is still dominant with around 84% market share. IE7 has quickly jumped to 12%, no doubt thanks to Windows Update. It will be interesting to see how this tracks over the next few months once all of the users who will receive the update automatically are accounted for.

Screen resolutions

Screen res. Market share Dec-06 +/- since July-06
1024×768 54.1% -2.0%
800×600 15.2% -4.7%
1280×1024 13.2% +1.2%
1280×800 6.6% +2.2%
1152×864 2.9% +0.1%
All others 8.0% +3.2%

There is a slow but clear shift towards larger monitors. At the top end things fragment quite a bit, with various different sizes to consider. Over 30% of our visitors are now using 1280x or bigger. But that still leaves a majority using smaller resolutions. It’s depressing to think that many of these people probably have a monitor capable of a higher resolution, but are unable (?) to change the setting.

How does your setup compare?
Personally I prefer Firefox. I switched when Firefox 1.0 was released and haven’t been tempted back. I use a 19″ monitor which runs at 1280×1024.

Which raises some interesting questions for web designers, developers and testers:

If you develop using Firefox do you really have your users in mind? At Trade Me our test team all use IE6 as their primary browser, for reasons that should be obvious looking at the table above.

Have you already abandoned IE5.x users? This is still a big audience. 1.4% of Trade Me users represents around 40,000 unique visitors each month.

How does your site look at 800×600? These stats are a reflection of our audience, but they’re also a reflection of our site. The new Trade Me design, launched in November, targets users with a resolution of 1024×768 or larger, but we’re also careful to ensure that the site still works for users at 800×600. On the homepage, for example, the size of the category links is reduced and the navigation tags are repositioned below the logo and banner advert. If your stats show a lower percentage of users with small resolutions, why is that? If your site works poorly for these users they are unlikely to come back.

How do your stats compare to these?

Source: all of the numbers above come from Neilsen//NetRatings.

Getting out more

I met Natalie and Tim from Decisive Flow just before Christmas. I’ve been following their blog and keeping an eye out for their public beta since they had some media coverage about PlanHQ a few months back, so it was good to meet them in person and talk a little about what they’re doing. I was a bit surprised when Natalie told me I was the first Trade Me person she had actually met.

Have we been that elusive? Stink!

I had two excellent opportunities to talk about Trade Me in 2006, firstly at Webstock in Wellington in May and later at TechEd in Auckland. The audiences at these conferences were quite different, but in both cases full of interesting people working with technology. Both were like a direct injection of enthusiasm. I really enjoyed sharing some of what we’ve learned so far at Trade Me and it seemed to be pretty well received (*).

I’m keen to get out of the office even more this year.

(*) If you’re interested, there are videos available online for both of my presentations: Webstock presentation, TechEd keynote.