An idea, by Rowan Simpson

Here is an idea that has been bubbling in the back of my mind for a while.

I don’t have the time at the moment to make it a reality, but if there is a smart developer or two out there looking for a project give me a shout.

The problem

It is very difficult to ensure that all of the page in a large and dynamic web sites contain valid HTML, especially as changes are made to those pages over time.

Why?

There are a few reasons:

It is very time consuming to manually validate a large number of HTML pages. And it’s hard to make the case for spending extra testing time on this.
Dynamic pages change (duh!) so it is difficult to validate all of the different permutations.
Pages that require users to post information in a form or login cannot be easily accessed by an automated test script, so are often not validated at all.

The solution

While developing and testing a site, or even just while browsing, we typically visit a large number of different pages.

The proposed tool will run in the background while we use the website and capture the details of each HTTP request and corresponding response. The HTML that is returned can then be validated asynchronously by the tool.

At the end of the browsing session (or even during, if required) the tool will provide a list of the pages that have been visited and a count of the number of warnings or errors contained on each. The user will be able to drill into the details of any page to investigate the cause of the warnings or errors.

This would allow developers and testers to quickly and easily validate a large number of pages, including those that require users to post form information and login, without any extra work over and above what they are already doing.

Required Components

A program that runs in the background collecting details of each HTTP request and HTML response

Perhaps this would be a Firefox extension? Or, a Windows application that hosts IE in a sub-window?

The user interface should be very simple (e.g. start/stop button, plus perhaps an indicator of the page count and/or total warnings/errors or a summary of the previous request a la the Firefox HTML Validator add-on (btw, does anybody know of an equivalent add-on which works on a Mac?)

An XHTML Validator

This would ideally give equivalent results to the W3C validation tool, which will require an SGML parser (but if this is too painful it could just implement HTML Tidy-like validation for starters)

This could also be extended in the future to validate CSS, JavaScript, and also run standard accessibility tests.

Result viewer

A simple web interface, with two views would do the trick:

List view
- Identify each page by URL and/or HTML title
- Identify pages which contain errors (perhaps using simple green, orange and red light icons)
Details view
- Lists all warnings/errors for the selected page (with HTML snippits)
- View full HTTP request and response details

Extra for experts

It would be nice to allow the user to compare results for a given page to the results from previous sessions, so that trends can be identified.