Google Analytics and Piwik

Google Analytics and Piwik are both Web Analytics systems, the first being a product from Google provided as Software as a service, the latter an open source system that can be deployed on your own server. Google Analytics comes in two flavors, a free version that can be used until 10 million hits, and a premium version that starts with $150.000 (in 2016) depending on the hits sent to the database. The difference between the two versions is not only the traffic, but also some features: the free version only offers aggregated data whereas the premium version lets users download raw data; also, most sophisticated features such as data-driven attribution are only available in the premium version. Piwik does not offer all features that Google Analytics has but has the huge advantage that you don’t have to pay for the raw data.

Google did not invent Google Analytics, the product is the result of the acquisition of Urchin in 2005 (Urchin is still present when you look at so-called UTM tags where UTM means Urchin Tracking Manager).

The basic concept of (the free version of) analytics is the session. A session is set to 30 minutes (which can be changed), and with every event that a user triggers or every page that he visits, the counter starts from the beginning. In other words, if a user stays 31 minutes on one page and then clicks on a link to another page on the same website, this would be two sessions (or two visits, as some people would say, although logically, this is one visit). A new session can also be started by re-entering the site via another channel. Advanced users with access to the premium version of Analytics often do not visit the Analytics site at all but perform their own analysis based on raw data.

By understanding how exactly is being measured, you will also identify a few constraints that most people are not aware of (although it is mentioned in the Google Analytics help), and these constraints are true for all web analytics systems that are based on JavaScript tags being fired. Since the script fires when a page loads, the time a user spends on a single page is measured by the distance in time between two visited pages. You visit the first page at 8 a.m. and then click on a link to another page on the same website at 8:03 a.m. You have spent 3 minutes on the site by now. If you spend 2 minutes on the 2nd page and close the browser window after reading the page, you have spent 5 minutes, but since you have not requested another page, only the first 3 minutes have been measured. As a consequence, time on site basically is the average of the time spent on the website minues the last page because it cannot be measured (in fact, it could be measured, but most website owners don’t do that for good reasons).

Similarly, bounce rate is not the rate of users who “immediately” leave the site after entering it but users who come to your site and only see one page, no matter whether it is 5 seconds or 30 minutes. Although this can be changed (“Adjusted Bounce Rate”), this is rarely done although it provides valuable information.

Another important concept of Analytics is the existence of events, e.g. the DOM being completely loaded or a timer that fires a specific action after x seconds. This allows us, for example, to implement an Adjusted Bounce Rate since the event basically checks if the user is still there after x seconds.

Google offers access to the Analytics account of the Google Merchandising Store; go to this help page and click on the access link (a Google account is required; in the future, you can access the store account directly via the Google Analytics interface).

The Google Analytics interface provides 5 sections:

  • Realtime: While users love to see what happens on their website right now, there is no actionable insight to be derived from here unless webmasters need to debug events or other implementation details
  • Audience: Information about the users, their interests, the technology used; there is also a new feature that lets analysts explore the behavior of single users. This data cannot be connected to other reports from scratch although it is possible to hack this.
  • Acquisition: Details about where users came from, including the conversions; this, however, is a last interaction view.
  • Behavior: Interaction with the website’s content, website speed, site search, and events
  • Conversions: Conversions from defined conversions goals or ecommerce; this section also offers an attribution module that allows to view alternative touchpoint views to the last interaction.

Reports are displayed in dimensions, e.g. sessions; in most of the reports, it is possible to add a second dimension.

Filed under: Data Science