In the physical world, when someone walks into your organisation and looks around, asks questions and gets involved with whatever you are promoting, you are well aware of it. Online, all of this takes place with you knowing very little about it — just a ghostly visitor crosses your path.
You can get acquainted with these apparitions by using a web stats package to analyse your log files.
How is it possible?
Whenever information is requested on your website, the server needs to know who to return it to. The communication protocol that the web is built upon — HTTP, hypertext transfer protocol — states that whoever requests information should also send some extra information. Two very useful ones for monitoring are what program (User-Agent) is requesting the information and who told them about the page they are requesting (Referer).
By logging what, when, who and how the visitor came to your website you can build up an impressive overview of activity on your site.
Why use web stats?
- Find problem areas creating errors
- Track spam bots
- Track hacking attempts
- Track search engine referrals
- Find search terms used to discover your site
- Help make decisions on redesigns and new content
Early planning
Plan early if you are not using a standard server (Apache, IIS etc.) or if your site uses plug-ins such as Flash.
If you are hosting multiple websites, make sure your server logs each virtual host to separate log files — having to split up log files later is an annoyance and can introduce errors.
Try to archive an adequate amount of data; I find a year’s worth comes in very useful.
When deciding to analyse your log files, decide what you want to know. Most programs generate a mass of charts and statistics which can quickly become overwhelming. What is it you want to know? Overall figures for a given month? How many computers accessed a specific page last week? What search terms led people to your site?
A cautionary example
Great — so you’ve just launched your new project and the web stats report shows 500,000 hits in the first week.
On closer inspection you find there are only three distinct IP addresses, and that for each page request there are 30 other requests for images and linked files (JavaScript and CSS). Looking deeper you find that one of the scripts generating your page links is self-referencing, and two search engines have been drilling down into your site creating page after page.
500,000 hits equals one visitor (that was you) and 10 real page views (the ones you checked on Monday).
Useful tools
- Analog — great free analysis program (Mac/Win/Linux)
- AWStats — free analysis Perl scripts
- Google Analytics — free web application
- Webalizer — free, lacks some in-depth reporting
- WebTrends — fully featured commercial suite