A custom visibility index with R and AWS


The third episode on search engine optimization with R and AWS is about creating your own visibility index to get an aggregated overview of the ranking of many keywords. In the first part, we looked at how to automatically pull data from the Webmaster Console with R and an AWS Free Tier EC2 instance, and the second part was about initial analyses based on click-through rates on positions.

What does a visibility index do?

To do this, let’s first look at what a visibility index is supposed to do. With Sistrix, the visibility index is independent of seasonal effects, which is quite charming, because you know in summer whether you can win a flower pot with winterreifen.de. This can be solved, for example, by using the average number of searches from the AdWords Keyword Planner. It’s just a shame that this tool only spits out reasonably useful values if you spend enough budget in Google AdWords. This approach sometimes falls flat, because we want to keep our monitoring as cheap as possible, at best free of charge.

Sistrix has the disadvantage that a) it is still too expensive for students who attend my SEO courses despite the low price and b) my little niche stories are not always available in the Sistrix keyword database. Sistrix and Co. are especially exciting if you want to compare a site with other sites (ideally from the same industry with the same portfolio). An abstract number like a visibility index of 2 is otherwise pretty meaningless. This number only makes sense if I can relate it to other websites and/or if I can use it to track the evolution of my own website rankings over time. The number itself is still not meaningful, because what kind of metric is being measured in? If I lose 2 kilos, then the metric is clear. But lose 0.002 SI? How many visitors are there?

We want to build an index that allows us to see whether our ranking changes over time across a large number of keywords. How our market competitors are developing can only be seen by scraping Google, and that is not allowed.

Visibility index based on the Webmaster Console

Obviously, it is better to rank 3rd with a search term that is searched 100 times a day than for a search term that is searched for only 2 times a day. So the number of searches should play a role in our visibility index. We exclude the search volume via the AdWords Keyword Planner for the reasons mentioned above, the only source we have left is the impressions from the Webmaster Console. Once we’re on the first page and far enough up (I’m not sure if it’s counted as an impression if you’re in 10th place and you’ve never been in viewable), we should be able to use the number of impressions from the Webmaster Console, even on a daily basis!

A small health check for the keyword “Scalable Capital Experiences” (AdWords / real data from the Webmaster Console):

  • 2400 / 1795 (September, but only half a month)
  • 5400 / 5438 (October)
  • 1000 / 1789 (November)

For September and October it looks good, only in November it is a bit strange that I had almost 80% more impressions than there were supposedly searches. Something must have happened in September/October that Scalable Capital suddenly had so many searches. In fact, this was also visible in the traffic on my site. We don’t get the first point clarified and accept that the AdWords numbers are not perfect either.

The following figures illustrate how different the visibility indices are depending on the weighting:

  • In the simplest model, only 11 minus position is calculated, everything above (greater than) position 10 then gets 1 point. The data for each result is added up for each day. This model has the disadvantage that I can climb up very quickly, even if I’m only in 1st place terms that are only searched for once a month.
  • In the second model, the same procedure is chosen, except that here the value is multiplied by the impressions
  • In the third model, the average CTR on the SERP from the second part of this series is multiplied by the impressions.

If you now look at the actual traffic, you can see that the 3rd model is already very close to the traffic. The spikes in real traffic aren’t quite as strong as in the index, and in the end, I don’t get as much traffic, but that may be because the actual CTR is below the expected CTR.

Alternative approach with the Webmaster Console

If you look at the plots, however, it becomes clear that this approach with impressions on a daily basis makes little sense. Because if the curve goes down, it doesn’t mean that I can do anything, because maybe there is just less searching for this topic and my rankings haven’t changed at all (usually you only write about what works, but I also find the failures exciting, because you can learn a lot from them :-)). This is exactly why Sistrix will probably also calculate out seasonal fluctuations.

Alternatively, you could simply average all impression data of a keyword landing page pair and use that average to calculate, again with the weighted CTR per position. The good thing about this approach is that seasonal or temporary fluctuations balance each other out. Plotted, it looks like this:

This plot looks very similar to the first plot, but that doesn’t mean that it always has to be that way. But when I look at the Sistrix values (even if I’m on a very low level), it looks very similar.

From data to action relevance

Now we have a plot that shows us the weighted development of our rankings, but what do we do with it? It’s not really “actionable”. It only gets exciting when we also look at which rankings change the most and have an influence on our visibility index. To do this, we first take the minimum ranking for each keyword landing page pair (minimal because low, and lower than 1st place is not possible) and then the current ranking. Finally, we calculate the delta and sort according to it:

The higher the delta, the greater the loss of ranking places, so to speak. And the greater the need for action, provided that the keyword is really interesting. In my example, for example, I wouldn’t think it would be bad to rank for “seo monitoring”, after all, the articles from this series are relevant for it. You could now weigh based on the impressions or the viewability index we chose earlier:

This looks more exciting: In fact, there are some search queries at the top (sorted by “Actionability”) that I find quite interesting. And now you could combine that with the data from the second part and build a dashboard… more on this in the fourth part

Leave a Reply

Your email address will not be published. Required fields are marked *