Materials for Web Analytics Wednesday, April 8, 2020


It’s great that you were part of the first virtual Web Analytics Wednesday. Here are the promised links:

All links marked with + are affiliate links

Are my texts being read? Implementing analytics in detail


On the occasion of the anniversary of the website Boosting (60th edition!), here is a deep dive on how to create a custom report on the texts read to the end. This is a supplement to my four-part series “Web Analytics: How Data Leads to Action”, in the 60th issue you will find the 3rd part. Basically, I had already written about the topic here in comparison to the scroll depth. This is an example of how custom and calculated metrics can be used.

The screenshot states per page:

  • How many words a text has
  • How many times a page has been viewed
  • The proportion of views that led to an exit
  • The number of visibility of the YARPP element (YARPP stands for Yet Another Related Posts Plugin), which displays similar articles at the end of an article. If this element is visible on the user’s screen, it is assumed that the article above the element has been completed)
  • The percentage of visibility of the YARPP element with respect to all page views
  • The number of clicks on a YARPP link
  • The percentage of clicks on a YARPP link in relation to the visibility of the element

What problem does this report solve?

  • If a text is read to the end less often than other texts, then this text does not seem to be written in such an interesting way.
  • The length of the text could be a predictor of whether a text is read to the end; but if a shorter text is not read to the end, this could be an even stronger signal that the text is in need of optimization.
  • If the links to similar articles are not clicked on even though they are visible, they do not seem to be relevant.

Create the custom dimension and metrics

  • In Analytics, go to Administration (bottom left) and then click Custom Definitions in the Property column.
  • Click on Custom Metrics and then click on the red New Custom Metrics button
  • Choose an understandable name (e.g. “YARPP Lakes”)
  • The Scope is Hit
  • The formatting type is integer
  • The remaining values can be left blank
  • Click Save.
  • Repeat the process once again, this time for the “YARPP Clicks”. The settings are the same.

The first entry should now have an index value of 1, and the second entry should have an index value of 2, unless user-defined variables have already been defined.

If the number of words in a text is also recorded, a user-defined dimension is required. The process is similar, here again select a suitable name and the scope hit. Again, the index value for this custom dimension needs to be remembered or noted, as it will be used later in Google Tag Manager.

Implementation in Google Tag Manager

Once the user-defined definitions and measured values have been implemented, values can now be written to these variables. This is done with the Tag Manager. First of all, the element must be selected on the page where the trigger of visibility should be triggered. The necessary steps for this are already described in this article. Then the following trigger is configured:

The trigger fires a tag, which now also has to be configured:

It is important in this step that the settings are overwritten, as this is the only way to pass a metric as a custom metric (Custom Metrics in the screenshot). Here you have to choose the index value that was defined by Analytics in the step above. The value of the measured value is 1 here, because for each sighting the counter jumps up by 1.

The Scroll Depth Threshold variable is not necessary, it may need to be configured first. This step must then be repeated again for the clicks on a YARPP link and, if applicable, for the custom dimension of the number of words per text. However, these can already be passed in the Google Analytics settings, which are defined as variables. In my case, the configuration looks like this:

As you can see, there are some things special about my configuration, but the WordCount is passed into a custom dimension with an index value of 7.

Creating the calculated metric

In order to display a ratio or conversion rate, a calculated metric is created. These are the columns “YARPP Seen CVR” and “YARPP Click CVR” in the example report in the first screenshot. Note: It may take some time for the custom metrics to be visible here! This means that this step may only be feasible after a few hours or even after a day.

In the Administration screen in the far right column, you will find the entry Calculated measured values. Click on the red button New Calculated Measured Value and then apply the following settings in the following screen. All you have to do is type in the first few letters of the variable name, and Analytics will complete the names. This is the setting for the Click CVR:

For the CVR lakes, the formula {{YARPP seen}} / {{pageviews}} is used.

Create the custom report

Last but not least, a report is now created, as can be seen in the first screenshot above. Under Customization (top left) and Custom Reports, a new report can be created. Here, all currently custom and relevant metrics available from board are selected and the appropriate dimension is selected. Unfortunately, no secondary dimension can be selected here; this must then be done manually when the custom report is invoked.

That’s it! Further valuable knowledge about web analysis can be found in my book “Introduction to Web Analysis”!

Hacking Google Optimize: From Bayes, p-values, A/A tests and forgotten metrics


Google Optimize is one of my favorite tools because it allows anyone to quickly build A/B tests; in my courses, participants are often amazed at how quickly such a test can be online. Of course, the preparatory work, the clean creation of a hypothesis, is not done so quickly, but it is also no fun to wait months for a test to go live. I don’t want to go into more detail about the advantages of Google Optimize, but instead point out three subtleties that are not so obvious.

Use Google Optimize data in Google Analytics raw data

The Google Analytics API also allows access to Google Optimize data that runs into Analytics, which allows the raw Analytics data to be analyzed into a Google Optimize test. This is especially interesting if something can’t be used as a KPI in Optimize, you forgot to set a KPI in Google Optimize, or you want to analyze side effects. Some of it also goes afterwards with segments, but hey, this is about hacking (in the sense of tinkerer, not criminal), you also do things because you can do them, not because they are always necessary

The two important Optimize dimensions are called ga:experimentId and ga:experimentVariant, and there is now also a combination called ga:experimentCombination. However, if you only run one test, then it is also sufficient to query only the dimension ga:experimentVariant. 0 is the original variant (control group), after which it is counted up per variant. If you have several tests running, simply look up the ID in the Google Optimize interface; it can be found in the right-hand column under Google Analytics. It is usually very cryptic, as you can see in the picture.

In my example, I have two experiments running, so I can output the combination next to three custom dimensions (Client ID, Hit Type and UNIX Timestamp) and page title (I cut off the Client ID a bit on the image, since it is only a pseudonymized date). In the second picture, we see the two experiments and the respective variants in a field. In the test, which starts with c-M, a student hypothesized that visitors to my site would see more pages and spend more time if the search window was higher up. I didn’t believe in it, but believing is not knowing, so we ran the test with the KPI Session Duration. I had forgotten to set the number of searches as the second KPI. Well, it’s good that I have the raw data, even if I could of course build a segment for it.

As we can also see in the screenshot, users are in two tests at the same time, as the other test should not affect the first test. Now, during the test period of 4 weeks, there were only 3 users on my site who searched for something, one of the users searched for a query several times, one user searched for two different terms. With such a small number of cases, we don’t even need to think about significance. In the meantime, it looked as if the search window up variant would actually win, but more on that in the last section. The question now is, why can the variant be better at all, if hardly any search was made? Or did the presence of the search box alone lead to a longer session duration? Very unlikely!

Let’s take a closer look…

It should be noted in the raw data that there are now two entries for each hit of a user, one per test. Also, not every user will be in a test, even if 100% of the traffic is targeted, which can already be seen in Google Analytics. We can also check whether the random selection of test and control group participants has resulted in a reasonably even distribution of users (e.g. mobile versus desktop, etc.). Of course, this is also possible with the interface.

The first thing I notice when I pull the data from the API is that the values don’t match those from the GUI. First of all, this is quite worrying. If I only look at users and sessions, the values match exactly. If I add the experimentCombination dimension, the numbers don’t fit anymore, and it’s not because of the differences between API v3 and v4. It’s not uncommon for the data to mismatch, most often through sampling, but that can’t be the case here. Interestingly, the numbers within the GUI also don’t match when I look at the data under Experiments and compare it to the audience dashboard. However, the figures from the API agree with the data from the experiments report. So be careful who forms segments!

If I drag the data including my ClientID dimension, I have a little less users, which is explained by the fact that not every user writes such an ID into the Custom Dimension, i.e. he probably has this Client ID (or certainly, because otherwise GA would not be able to identify him as an individual user), but I somehow don’t manage to write the ID into the dimension, so that there is e.g. “False”.

Now let’s take a look at some data. For example, I’m interested in whether Optimize manages to get the same distribution across devices as I have on the site:

The majority of my traffic still takes place on the desktop. What does it look like in Optimize?

The distribution is definitely different. This is not surprising, because no Optimize experiment should be played out on AMP pages; so it is rather surprising why experiments on mobile devices have taken place here at all. And these cases have different values in relation to the target KPI, as you can also see in Analytics:

So we can’t draw conclusions about the whole page from the test results, but we also don’t know how big the effect of the unexpected mobile users is on the test result. To do this, we would have to redetermine the winner. But how is the winner determined in the first place? For example, we could use a chi-square test with the observation of the average SessionDuration:

chisq.test(x) Pearson’s Chi-squared test with Yates’ continuity correction data: x X-squared = 1.5037, df = 1, p-value = 0.2201′

In this case, p is above 0.05, more to p in the next section. If the Chi-Square test is the correct test at all, it would show that the difference is not statistically significant. However, this is not the test that Google Optimize uses.

Bayesian Inference versus NHST

What exactly is happening under the hood? Let’s take a look at how Google Optimize calculates whether a variant won or not. Unlike Adobe Test & Target, for example, or most significance calculators like Conversion’s (although Convertibility doesn’t even say what kind of test they’re using), Google Optimize isn’t based on a t-test, Mann-Whitney-U, or Chi Square test, but on a Bayesian inference method. What does that mean?

Two different ideas collide here, that of the so-called frequentists (NHST stands for Null Hypothesis Significance Testing) and that of the Bayesian inference supporters. Some of these have been and still are intensively discussed in statistics, and I am not the right person to make a judgement here. But I try to shed light on these two approaches for non-statisticians.

Most A/B testing tools perform hypothesis testing. You have two groups of roughly the same size, one group is subjected to a “treatment”, and then it is observed whether the defined KPI in the test group changes “significantly”. For significance, the p-value is usually looked at; if this is below 0.05 or however the significance level has been defined, the null hypothesis is rejected. Although you don’t see anything about null hypotheses etc. on the tool interfaces, probably so as not to confuse the users, the thinking construct behind them assumes that. For example, if it is tested whether a red button is clicked more often than a blue one, the null hypothesis would be that both are clicked on equally often. The background to this is that a hypothesis cannot always be proven. But if the opposite of the hypothesis is rather unlikely, then it can be assumed that a hypothesis is rather likely. The p-value is about nothing else.

Now the p-value is not a simple story, not even scientists manage to explain the p-value in such a way that it is understandable, and there is a discussion as to whether it makes sense at all. The p-value says nothing about how “true” a test result is. It simply says something about how likely it is that this result will occur if the null hypothesis is true. With a p-value of 0.03, this means that the probability that a result will occur with a true null hypothesis is 3%. Conversely, this does not mean how “true” the alternative hypothesis is. The inverse p-value (97%) does not mean a probability that one variant will beat another variant.

Another common problem with A/B testing is that the sample size is not defined beforehand. The p-value can change over the course of an experiment, and so statistically significant results can no longer be significant after a few days because the number of cases has changed. In addition, it is not only the significance that is of interest, but also the strength/selectivity/power of a test, which is only displayed in very few test tools.

But these are mainly problems with the tools, not the frequentist approach used by most tools. The “problem” with the Frequentists approach is that a model doesn’t change when new data comes in. For example, with returning visitors, a change on the page can be learned at some point, so that an initial A/B test predicts a big impact, but the actual effect is much smaller because the Frequentists approach simply counts the total number of conversions, not the development. In Bayesian inference, however, newly incoming data is taken into account in order to refine the model; decreasing conversion rates would influence the model. Data that exists “before”, so to speak, and influences the assumptions about the influence in an experiment are called initial probability or “priors” (I write priors because it’s faster). The example in the Google Help Center (which is also often used elsewhere) is that if you misplace your cell phone in the house, according to Bayesian inference, you can use the knowledge that you like to forget your cell phone in the bedroom and also “run after” a ring. You are not allowed to do that with the Frequentists.

And this is exactly where the problem arises: How do we know that the “priors” are relevant to our current question? Or, as it is said in the Optimizely blog:

The prior information you have today may not be equally applicable in the future.

The exciting question now is how Google gets the Priors in Optimize? The following statement is made about this:

Despite the nomenclature, however, priors don’t necessarily come from previous data; they’re simply used as logical inputs into our modeling.

Many of the priors we use are uninformative – in other words, they don’t affect the results much. We use uninformative priors for conversion rates, for example, because we don’t assume that we know how a new variant is going to perform before we’ve seen any data for it.

These two blog excerpts already make it clear how different the understanding of the usefulness of Bayesian inference is. At the same time, it is obvious that, as in any other tool, we lack transparency about how exactly the calculations were achieved. Another reason that if you want to be on the safe side, you need the raw data to carry out your own tests.

The Bayesian approach requires more computational time, which is probably why most tools don’t use this approach. There is also criticism of Bayesian inference. The main problem, however, is that most users know far too little about what exactly the a/b testing tools do and how reliable the results are.

Why an A/A test can also be healing

Now the question arises as to why there was any difference in session duration at all, when hardly anyone was looking. This is where an A/A test can help. A/A test? That’s right. There is also something like that. And such a test helps to identify the variance of one’s own side. So I had a wonderful test where I tested the AdSense click-through rate after a design change. The change was very successful. To be on the safe side, I tested again; this time the change had worse values. Now, of course, it may be that worse ads were simply placed and therefore the click-through rate had deteriorated. But it could also simply be that the page itself has a variance. And these can be found out by running an A/A test (or using the past raw data for such a test). In such a test, nothing is changed in the test variant and then it is seen whether one of the main KPIs changes or not. Theoretically, nothing should change. But if it does? Then we identified a variance that lies in the page and the traffic itself. And that we should take into account in future tests.

Result

  • Differences in test results can be caused by a pre-existing variance of a page. This is where an A/A test helps to get to know the variance.
  • The results may differ if different tools are used, as the tools have different approaches to how they determine the “winners”.
  • Raw data can help to use your own test statistics or verify test results, as the tools offer little transparency about how they got to the respective tests. For example, as in my example, it can be that the test was not played evenly at all and therefore the results are not so clearly usable.
  • The raw data is sometimes very different from the values in the GUI, which cannot be explained.
  • The p-value is only part of the truth and is often misunderstood.
  • For A/B tests, you should think about how large the sample size should be in advance with the Frequentists approaches.
  • A fool with a tool is still a fool.

Why web analysis (as we know it today) will become extinct


The era of meaningful web analytics has only just begun. More and more companies understand that PageViews are not a suitable KPI to check the success of content investments. And yet the end of what we are just growing fond of is approaching before it can become too beautiful.

This is not another click-bait article about how machines will take our jobs. Of course, machine learning will eliminate the simple analyst jobs. We can already see today that Google Analytics detects anomalies independently in the free version. Questions about the data can be asked by spoken language. And since the challenge is usually to ask the right questions of the data, this will also be covered by independent analysis. In the 360 version, Analytics offers the new Analyze mode. It will be possible to create analyses more and more automatically. And that’s a good thing. Because even if we have a lot of experience and know which segments are worth investigating, a machine can simply calculate all the combinations and come up with segments that we as humans would never have found. The poking in the haystack has thus come to an end. And it is more efficient when machines undertake this search. It is no longer the case that information is only generated from data – actions are already being derived. What is most difficult for most users to learn from the data, what should actually be done, will also be able to be interpreted and articulated by the machine.

It’s not that simple, the coachmen of web analysis will say: You can only draw something meaningful from the data if it has been collected sensibly. And in most cases, this is not the case that meaningful data is collected. As long as a standard installation of a web tracking tool is used in many cases, there is still a lot to do. But if we take a closer look at today’s Google Tag Manager, it becomes clear that many user interactions can already be tracked automatically. Clicks on links. Scroll depth. Element Visibility. What still has to be set up today could be done automatically tomorrow. And it would be the logical next step. So let’s assume that at some point in the near future, setting up a tag management will be eliminated. Depending on how much you pay, events are measured more or less granularly. And only those from which the machine has learned that they are important for a defined goal.

The complexity of data acquisition will be eliminated, and analysis will be eliminated. What will be left then? To work out a digital strategy that is designed on the basis of data? I don’t see that in the brand essence of web analysts. Manual web analysis is a bridging qualification, the gas station attendant of analytics solutions. Because in 5 years at the latest, 2023, the big customers will already be working with web analytics AI and no longer with an expensive, vain web analyst.

What should we do, we “web analytics heroes”? Either we qualify further, from petrol station attendant to module manufacturer. Or we sell the sweets around it. The McJobs of web analytics. And they can also be torched in India. The only option left for us is to use data science to develop solutions that are too niche for Google Analytics and Co to be the first to prove them. We will be able to earn money with implementations and training for a few more years, but by 2023 at the latest, this will be over.

Comments (since February 2020 the comment function has been removed from my blog):

Maik says

  1. May 2018 at 13:22 Hallo Tom,

thank you for your contribution, which offers a new perspective for many. From our conversation during our podcast recording, I already know that you have a very data-driven approach and also subordinate the future of web analytics to machine learning.

Like so many disciplines that currently exist in (online) marketing, web analytics is one of the – in my opinion – still rising trends. Think of SEA, SEO, affiliate marketing, email marketing, … All of this is booming. Still. Although, for example, SEO was constantly talked to death for a long time – and Google was increasingly credited with more competence to be able to do things “on its own”. Certainly, Google HAS gotten better. But the people/companies with their websites are not. And that’s how I see it with web analytics.

As long as a discipline has not yet reached the breadth (especially medium-sized companies and small companies), five years is, in my opinion, too short a period of time to achieve changes in breadth. At the top (e.g. corporations) I see it somewhat differently, but the masses have to notice it – otherwise the change is more of a local maxima. But maybe that also means that we will all work for Amazon.

I agree with you when you say that machines can detect anomalies or clusters in the data much better than any human ever could. But the question is: What do we, the people, do with it in the end? And there is still a need for people – probably much longer than in 2023 – to translate this into “doing” after they have previously compared it with their strategy.

The catch is that machine learning needs data – as much as possible. And IMHO we took a decent step backwards in this regard on May 25th. Perhaps we – or those who passed the GDPR – have even put Europe on the sidelines. And the regulations of the regulation and people’s fears of misuse of the data ensure that it remains that way for the time being. This is quite a brake for many companies.

In addition, even if this can and certainly will change in the next few years, AI or ML is not yet a “mass phenomenon”. Everyone talks about it and maybe guesses the possibilities, but in fact “Hurray” is not heard everywhere out there. There is a lack of systems that collect more data, (currently) people who are familiar with it and, ultimately, often also a lack of knowledge of what you can do with it.

You said quite rightly: You need the right questions. Without them, AI will be helpless for much longer than in 2023. But in order to ask the right questions, you need people with an understanding of numbers, business, strategies and implementation knowledge. OK, maybe at some point they won’t be called web analysts anymore (just like there may be no more pure SEOs and the SEA’s people, whose jobs have all been taken over by Google AI), but web analytics itself will be part of a perhaps new job description that ALSO requires an understanding of web analytics – and for that you still need people who don’t do anything else all day. Training may then be more focused on strategies.

I see web analysts and data scientists on more or less separate tracks. For me, the web analyst is often closer to the business and what follows from numbers. The pure data preparation and the like – I’m sure – will certainly simply fall away at some point.

In the course of this, perhaps a reference to a great article that I devoured the other day and will only let us chill in 50 years anyway. Here is the link: https://waitbutwhy.com/2015/01/artificial-intelligence-revolution-1.html

Well, and above everything you and I say, there’s one more thing: Maybe in 5-10 years we won’t have internet anymore, but something even cooler.

Thank you for bringing the topic here to your blog. And double thanks for the analogy with the gas station attendant. Love it. Fill up the tank, please. Maik

Tom Alby says

  1. May 2018 at 23:33 Dear Maik,

every exchange with you helps me to sort and question my thoughts, and as always, I am very grateful to you for that.

The reason why I predict the end of the Web Analyst within a few years is not pure sensationalism. I believe in it. Here’s why:

  1. If we assumed a linear development, then I would not formulate such a prophecy. But we don’t have a linear development. We have a non-linear development. An exponential development. And it is a characteristic of humans to underestimate exponential developments. An old legend shows this in the person of Sissa ibn Dahir (https://de.wikipedia.org/wiki/Sissa_ibn_Dahir). In 1996, my father was still scolding me for buying something as crazy as a cell phone, 10 years later we had Internet on our cell phones, another 10 years later we have voice assistants at home, and an Internet of less than 10 MBit is unthinkable, at least in the cities. Your article describes exactly this exponential development, doesn’t it?
  2. We don’t just create technology, technology changes us. Technology is changing the way we think, and that always leads to technology being criticized. Scripture, Ong argues, was criticized by Plato for externalizing thought. When calculators came along, the same criticism was voiced that people were unlearning how to calculate. And now we hear that artificial intelligence does the thinking for us and that this is not good. In fact, however, every technology has brought us further. Whether that was always a good thing is another matter. But even writing as a technology has changed our thinking. And so machine learning will also change our thinking. In a few years, we will be thinking in terms of machine learning. And if you say that this is still very far away at the moment, I don’t think so because of the exponential development.
  3. You say that web analysts are closer to the business problem than computer scientists. I couldn’t disagree more. Of course, I believe you that you focus primarily on business goals. As a data scientist, on the other hand, the first question for me is always what problem I am actually solving. That’s what I teach my students (http://wordpress-95500-642800.cloudwaysapps.com/lehrveranstaltungen/data-science-analytics/understanding-the-business-problem/).
  4. SEO is dead. For me, most SEOs are the alternative practitioners of online marketing. We believe them because it is difficult for us to assume that we are not in control. But I think I’ve already published enough data to prove that we are dealing with snake oil in particular. I have never seen an SEO publish his numbers like I do.

Yes, maybe 2023 will be sporty. But if Google does anything, it’s something scalable. And even though AdWords Express didn’t work great in the beginning, more data will make it work.

Would you like a Snickers to fill up?

Tom

Maik says

  1. May 2018 at 10:12 Hey Tom,

the legend of Sissa ibn Dahir is a great example. On the other hand, especially with exponential gradients, especially at the beginning, the gradients are not yet so high that they provoke such overturns. But you’re right: If scalable, then Google.

And indeed, the post I linked describes, among other things, exactly this growth – in sometimes frightening, because incredibly huge, dimensions. And of course, anyone who tries to understand this has trouble. What will it be like if there are machines that are 1 million times more intelligent than us humans before the end of this century? I cannot imagine …

The good thing is that the two of us don’t really talk about IF it (the replacement of certain jobs) will happen, but actually only WHEN. In this respect, I am with you on many points. Ultimately, it doesn’t matter whether web analyst or data scientist jobs are more likely to disappear than others, but rather what our future task will be. Whether in the future there will still be a need for people who think “analytically” and approach solutions with “creativity”.

And yes, I also see that SEO has become “overthinkable” (I would call it benevolent) in many places. At least in the way it is operated by many SEOs. (Your term “alternative practitioner” fits it quite well, I think).

In this special discipline, too, it is increasingly true for me: People are needed who deal with a sensible information architecture (technically and content-wise) with regard to its users and make improvements. This requires (for the time being) an understanding of people, values – and solid data to measure improvements. But maybe in 5 years websites will be built completely by machines. Maybe in 10 years there will be no more websites, but only VR. Who knows?

One more thing: I hope that developments in the next few years/decades will only progress so quickly that we humans have a chance to find our role in the “new world” – and not be completely overwhelmed. Because if machines can one day do everything that we humans can do, and do it much better, then we will certainly be able to solve a lot of problems here in the world – but we will also have many new ones as far as our tasks are concerned. And especially the component “exponential” is somehow – as funny as it may sound – unpredictable.

Snickers? Yes, of course. Takeaway, please. Maik

Christian Hansch says

  1. June 2018 at 16:09 Hallo Tom,

what exactly do you mean by the qualification as a module manufacturer at the end of your article? I also believe that we web analysts will be replaced in 5-10 years, but actually I still have to work for almost 20-25 years. Hypothesis: In the future, it will be even more important to ask the right questions with the right constraints if necessary, so that the machine can “spit out” the right answer.

I think the GA app with the point out of the anomalies is also great. But I wonder how the machine is supposed to favor the various goals and not always only the monetary output is maximized, sometimes also the communication, which is difficult to measure.

Best regards Christian

P.S.: It’s a pity that I didn’t see your presentation at the Campixx, then I could better understand or discuss why SEO should be dead.

Christian Hansch says

  1. June 2018 at 10:54 .. Here is an answer from the Google team to your question: https://www.youtube.com/watch?v=BBZh_O0MeeE&list=PLI5YfMzCfRtaaQpilSJf9jqrP7BVfjBWI&index=2&t=0s at approx. 13:30 …

Is my content being read? Measure the visibility of elements!


In September 2017, I wrote that the scroll depth would be a better indicator of whether a piece of content has been read than the pure session duration, which is nonsense anyway. A month later, Google then released a new feature in Google Tag Manager, a trigger for the visibility of elements (the note was missing in the German version of the release notes). This compensates for some disadvantages of the scroll depth approach, especially the restriction that not every page is the same length and “75% read” does not always mean that the content was read to the end (75% was chosen because many pages have an immense footer and therefore users do not scroll down 100%). A page on mine has so many comments that they make up more than half of the content.

What does element visibility mean?

In simple terms, this feature means that a trigger is triggered when an element of the page becomes visible on the user’s screen. The element only needs to be uniquely named, so that only this one element with this name can trigger the trigger. On my site, for example, I would like to know how many users have scrolled down so far that they have finished reading the respective text with a high degree of probability. This is probably the case when users see the reference to the similar articles that are created in my blog by the YARPP plugin. In most browsers, it is possible to select an element with the mouse and then examine the element with a right-click/CTRL click on it, so that we can then see exactly what that element is called.

This trigger can now be set up in the Tag Manager, which looks like this, for example:

In addition, an event is set up, and we have tracking based on the visibility of an element.

Does that really make a difference?

Yes. In my article about a year of experience with Scalable Capital, just under 30% have read at least 75% of the content, but just under 70% have seen the YARPP element. The page consists of almost 80% comments (it is frightening enough that only 70% of the short article saw the element). For other articles, the new measurement of the visibility of elements in Google Tag Manager is not so dramatic; so the article about my bad experiences with the Vorwerk Thermomix is apparently too long for the Thermomix interested party: 26.1% see the YARPP recommendations, 22.3% scroll down to 75%.

Can I turn off the scroll depth now?

No. Of course, you can do whatever you want, but since the session duration becomes more accurate by triggering events, we want to measure not only the time of those who made it to the defined element, but also the time of those who bounced before, for example at 25%. So even if at first glance it looks like we could save 4 events, we should leave these events in to improve the data quality.

Is my content being read? Scroll depth per article as conversion


In September 2017, I wrote that the scroll depth would be a better indicator of whether a piece of content has been read than the pure session duration, which is nonsense anyway. A month later, Google then released a new feature in Google Tag Manager, a trigger for the visibility of elements (the note was missing in the German version of the release notes). This compensates for some disadvantages of the scroll depth approach, especially the restriction that not every page is the same length and “75% read” does not always mean that the content was read to the end (75% was chosen because many pages have an immense footer and therefore users do not scroll down 100%). A page on mine has so many comments that they make up more than half of the content. Continue reading “Is my content being read? Scroll depth per article as conversion”

The optimal tracking concept or The sailing trip without a destination


How often have I heard the sentence “Let’s just track everything, we can think about what we actually need later. But of course the tracking concept can already be written!”

Let’s imagine we want to go on a trip with a sailboat and we said “I don’t know where we want to go, let’s just take everything we could need for all eventualities”. Our boat would sink before the trip has begun. We would not know whether we would have to take water and canned food with us for a day or several weeks, whether we would need winter clothes or summer clothes and so on. But to be on the safe side, we just buy the whole sailing supply store empty, we will need some of it. And now we have more than the ship can bear in terms of load.

Likewise, you can’t track everything that may be needed. Or maybe it is, but that would not only be very expensive. It would also make the website virtually unusable for users. More on that later. The bad news for all those who are looking for a simple solution to a difficult question: A tracking concept requires a lot of brainpower. If you don’t, you collect useless data in most cases and burn time and money. Just as we have to think about what we want to take with us on the sailing trip, depending on the destination.

No tracking concept without clear goals

First of all, there is no way around defining goals, SMART goals, i.e. what by when, etc. For example, 100,000 new customers in a quarter or €500,000 in sales in a quarter. That is our destinationKPIstell us where we are on the way to this goal. Similar to a nautical chart, on which we determine our position through navigation instruments and adjust the route if we have strayed from the destination.

If I realize that I probably won’t reach my goal of 100,000 new customers, then I want to know what screws I need to turn so that I can take corrective action. But at least I would like to understand why this is so. Maybe I have to look for another goal because my actual goal doesn’t make sense at the moment. Because if I see that there is a storm in front of my destination port, then there may be another port. Through this we may then be able to reach our actual destination later. If I don’t reach the sales target because the return rate is higher than expected, I want to understand the cause. I won’t identify them with a standard implementation of Google Analytics.

All data and the information to be derived from it have only one meaning. We want to understand what action we can derive from the data. If a piece of information is only interesting, but does not offer any relevance to action, then the data has very likely been collected unnecessarily. At sea, I’m not interested in the weather forecast from two days ago. Nevertheless, such data is written in reports, after all, you have them, they will be good for something, we will notice that later. In the same way, we sail across the sea with our overloaded boat rather badly than right and tell ourselves that we will need the stuff at some point, we just have to get into the situation first.

On the impossibility of being prepared for everything

Space is limited on a boat, and all material has to find its place. This also applies to a tracking tool. For a shop, a connection to a CRM would certainly be interesting, so that the customer lifetime value etc. can be determined. Most likely, you will also want to work with custom dimensions in Google Analytics, so that data from the CRM can be used in Analytics for segmentation.

But how am I supposed to know which custom dimensions need to be defined if I don’t even know if and which ones I will need later? Especially if the number of custom dimensions is also limited? Custom dimensions are a fundamental decision, similar to a change to the boat that cannot be undone. Because a custom dimension can no longer be deleted.

Every event is a small program that creates load

Each piece of material has weight and changes the sailing characteristics of a boat, to the point of overloading. And of course, you can also use a tracking tool to trigger an event in the browser every second to see how long a user has been doing what on a page. But running events is running small programs in the browser, and a lot of load is not good, neither for the browser nor for the user. One of them will give up, the only question is who comes first.

So a tracking concept can really only be written when the goals and KPIs are clear. Unfortunately, the definition of it is an exhausting story. The good thing is that once this task has been completed, an actionable reporting dashboard can also be built. Numbers are no longer reported just because they can be reported, but because they provide added value. However, most dashboards are far from that. And so most sailboats are driven more at will, feeling and visibility. Except that we don’t put our lives at risk in online marketing.

Of course, you can make a stopover later on the route in a harbor and adjust the provisions, equipment and boat, because you realize that it doesn’t work that way. But then I lost not only time, but also a lot of money. The same applies to the tracking concept. If I don’t think about it upfront, then I’ve invested a lot of time and money in an enormously complex implementation without being able to use any of it the way I actually need it.

What is the standard for tracking?

“And if we just do what you do? There will be some standards.” The comparison with the sailing trip also fits here: What is the average sailing trip like? I have hardly seen a tracking concept that is the same as the other, even in the same industry. And so no two sailing trips are the same, because every boat is a little different, the crew is different, etc.

If you want to avoid the definition of the destination, you just want to set off to signal movement, but will notice at sea at the latest that you will not be able to sail through. Or he hopes that no one notices. At some point, however, someone will notice that no one is really interested in the numbers because they are completely irrelevant.

If you don’t know the port you want to sail to, no wind is the right one. (Seneca)

10 Google Analytics Basics (also for other web analysis tools)


Google Analytics had its 10th birthday last year, and in the last more than 10 years I have been able to gain a lot of experience that you have to consider when using web analytics systems. Here are my 10 basic tips, starting with the absolute basics, then the basics for those who really want to do something with their data

  1. Use a tag management system, especially for more complex configurations (e.g. cross-domain tracking) this is indispensable. But even if only the basics described here are to be implemented, a tag management system is important. Most systems offer a preview, so there is no need for open-heart surgery. And if you don’t want to give your web analytics person access to change the analytics code, then a tag management system is mandatory anyway.
  2. While we’re at it, the Google Tag Assistant is a good addition if you use Tag Manager and/or Google Analytics.
  3. Test everything you do with the real-time reports, unless it can be tested via Tag Manager and Tag Assistant.
  4. Use the Adjusted Bounce Rate. There is no getting around this. The bounce rate is usually defined as counting a bounce when a user comes to the page and leaves it “immediately”. “Immediately” is then something between 5 and 10 seconds, depending on the definition and system. With Google Analytics, a bounce is counted as such when a user comes to a page and doesn’t look at another page, no matter how long they’ve been on it. So maybe he didn’t really bounce, but read through the whole page, and after his need for information was satisfied, he left again. For some content sites, this is normal behavior. But it’s not really a bounce. For me, a bounce means that a user considered the landing page irrelevant and therefore left immediately. And that’s a construction site that you don’t realize until you’ve configured the right bounce rate.
  5. Be clear about what the point of your page is. You would have come up with that on your own? I have experienced too often that there are very different views in a company about why a website exists. Sometimes the participants of a workshop could not agree in 2 hours. Why does the site exist? What role does it play in your company’s overall business strategy? Is it a sale? Is it branding? Is it monetization via advertising? Did you just want to have a www on your letterhead? Does your page have multiple goals? Also ok. Write them all down.
  6. So how can you measure whether the business goals are being met? To do this, you define the KPIs. Example: You want to sell something, then your goal is the number of conversions. Or? If you take a closer look, you probably have a sales goal (e.g. €1,000 a day), and the number of conversions won’t help you much if you don’t earn the same amount with every conversion. There are several adjusting screws for the sales target, traffic, conversion rate, shopping cart value, returns. This results in sub-goals, such as 2,000 daily users, a conversion rate of at least 1% (which is a good standard value, by the way), and an average shopping cart of 50€ as well as a return rate of 0% (which is very unrealistic unless you sell a digital product). If you don’t get to the €1,000, you have to analyze why this is the case based on the KPIs mentioned. For branding pages, on the other hand, we have different metrics. You want users who don’t leave immediately (see Adjusted Bounce Rate above). You want users to engage with your page, so Time on Site or Pages per Visit could be good metrics. If you want to reach users who don’t know you yet, the metric Number of new users is interesting. But here, too, set goals. If you don’t have goals, then any number doesn’t matter. Are 300,000 visitors good or little? Is 2% growth good or not so good? It doesn’t matter if you don’t have any goals.
  7. The standard Google Analytics dashboard is relatively pointless. What does the ratio of new to returning users say? What do you do with this information? Honestly, you can’t actually do anything with any of the information listed in the standard dashboard. The KPIs that are actually important belong on a proper dashboard. Use the gallery (in Google Analytics). Many problems have already been solved by other users.
  8. Web analytics (as well as data analysis in general) starts with a question. The answer is only as good as your question. Examples of good questions: Which acquisition channel brings me the most revenue (and, more importantly, is it worth having more of it)? What’s going on with the channels that bring in less revenue? Which demographic audiences “work” best (depending on the goal), and what content doesn’t fit those audiences? Does my target group read the texts of the website to the end? What elements of my website increase the likelihood that a user will bounce? The questions already show that web analytics is not a one-time matter, but must be continuous.
  9. Segmentation is the killer feature in web analytics. Almost every question can be answered by segmentation. Example: Segmentation by mobile versus desktop, demographics, acquisition channels. Without segmentation, analytics is a toothless tiger.
  10. And finally, the killer basic: You don’t want any data. What you want is information that will help you decide what you need to do. Analytics provides you with data, you draw information from it, and actions come from it. Data -> information -> action, that’s the absolute analytics mantra. If there is no action, then you don’t need the data. My former colleague Avinash uses the So what test for this. If you don’t have an action from a date after asking “So what?” three times, then forget about the KPI. I would go one step further: If you don’t have a question (see point 8) whose answer results in an action, then the initial question was wrong.

This list is not necessarily complete, but with these 10 points you can get damn far. Feedback is always welcome.