Log Files

The web analytics era started with the analysis of log files. Every time a web browser requests a page from a web server, the server logs the request for each single file that is connected to that page, e.g. the images that are referenced on that page or a CSS file. The web server logs the following information:

  • the IP adress
  • the requested source
  • date and time of the request
  • get or post
  • the size of the requested file in bytes
  • the HTTP version
  • the HTTP status code
  • The referrer
  • The user agent

While in the very early days of the WWW, most users had their own dedicated IP adresses, this soon changed as the web became more popular. Users who dialed in via Compuserve or AOL got dynamic IP adresses, so that a unique IP adress did not represent a single user. Also, in some cases, several people/computers may hide behind one IP adress.

Another disadvantage of log files is that they log really every request, be it a human or a bot. For some websites, the majority of requests is created by bots, and this does not only refer to search engine crawlers.

Having said that, log file data is available in realtime (with a delay of very few seconds), allowing for immediate analysis if something goes wrong that cannot be detected by systems based on JavaScript (we will handle those in the Cookies and Pixels section). Log file data is also called server-based tracking.

Filed under: Data Science