If you read along here often, you know that Sistrix is one of my absolute favorite tools (I’ll brazenly link as the best SEO tool), if only because of the lean API, the absolutely lovable Johannes with his really clever blog posts and the calmness with which the toolbox convinces again and again. Of course, all other tools are great, but Sistrix is something like my first great tool love, which you can’t or don’t want to banish from your SEO memory. And even if the following data might scratch the paint, they didn’t cause a real dent in my Sistrix preference.
What problem am I trying to solve?
But enough of the adulation. What is it about? As already described in the post about keywordtools.io or the inaccuracies in the Google AdWords Keyword Planner data mentioned in the margin, it is a challenge to get reliable data about the search volume of keywords. And if you still believe that Google Trends provides absolute numbers, well… Sistrix offers a traffic index of 0-100 for this purpose, which is calculated on the basis of various data sources, which is supposed to result in higher accuracy. But how accurate are the numbers here? Along the way, I also want to show why box plots are a wonderful way to visualize data.
The database and first plots with data from Sistrix and Google
The database here is 4,491 search queries from a sample, where I have both the Sistrix and the Google AdWords Keyword Planner data. By the way, it’s not the first sample I’ve pulled, and the data looks about the same everywhere. So it’s not because of my sample. So let’s first look at the pure data:

As we can see, you could draw a curve into this plot, but the relation doesn’t seem to be linear. But maybe we only have a distorted picture here because of the outlier? Let’s take a look at the plot without the giant outlier:

Maybe we still have too many outliers here, let’s just take those under a search volume of 100,000 per month:

In fact, we see a tendency here to go up to the right, not a clear line (I didn’t do a regression analysis), but we also see that with a traffic value of 5, we have values that go beyond the index values of 10,15,20,25 and 30, even at 50 so we see the curve again:

The median ignores the outliers within the smaller values:

So if we look at the median data, we see a correct trend at least for the higher values, with the exception of the value for the Sistrix traffic value of 65 or 70. However, the variation around these values is very different when plotting the standard deviations for each Sistrix traffic value:

We don’t see a pattern in the spread. It is not the case that the dispersion increases with a higher index value (which would be expected), in fact it is already higher with the index value of 5 than with 10 etc. We see the highest dispersion at the value of 60.
All-in-one: box plots
Because boxplots are simply a wonderful thing, I’ll shoot it after that:

Here the data is reversed once (because it was not really easy to see with the Sistrix data on the X-axis). The box shows where 50% of the data is located, so with a search volume of 390, for example, 50% of the data is between the Sistrix value of 5 and 25, the median is indicated by the line in the box and is 15. The sizes of the boxes increase at the beginning, then they are different sizes again, which indicates a lower dispersion. At some data points, we see small circles that R has calculated as outliers. So we see outliers, especially in the low search volumes. Almost everything we plotted above we get visualized here in a plot. Boxplots are simply wonderful.
What do I do with this data now?
Does this mean that the traffic data in Sistrix is unusable? No, it doesn’t mean that. As described in the introduction, the Keyword Planner data is not always correct. So nothing is known for sure. If you see the Keyword Planner data as the ultimate, you won’t be satisfied with the Sistrix data. It would be helpful if there was more transparency about where exactly the data comes from. Obviously, tethered GSC data would be very helpful as it shows real impressions. My recommendation for action is to look at several data sources and to look at the overlaps and the deviations separately. This is unsatisfactory, as it is not automatic. But “a fool with a tool is still a fool”.
Comments (since February 2020 the comment function has been removed from my blog):
Hanns says
- May 2018 at 21:18 Hello, thank you very much for the interesting analysis. Have you ever tried the new traffic numbers in the SISTRIX Toolbox? This also gives you absolute numbers and not index values. To do this, simply activate the new SERP view in the SISTRIX Labs. Information can be found here (https://www.sistrix.de/news/nur-6-prozent-aller-google-klicks-gehen-auf-adwords-anzeigen/) and here (https://www.sistrix.de/changelog/listen-funktion-jetzt-mit-traffic-und-organischen-klick-daten/)
Tom Alby says
- May 2018 at 10:58 I hadn’t actually seen that before. Thanks for the hint. But these are the ranges here, not the really absolute numbers. But still very cool.
Martin Says
- April 2019 at 13:33 Moin, I read your post and tried to understand. But I can’t figure it out. Sistrix is cool yes, but unfortunately I don’t think how reliable the data is.
I actually don’t understand how this is supposed to work technically. How is Sistrix supposed to get the search queries that run through Google for each keyword? It’s not as if Google informs Sistrix briefly with every request.
The only thing I can think of is that they pull the data for each keyword from AdsPlanner. But… to present this as “own search volume” without any indication of where the data comes from, I would find grossly negligent.
Where could they still get data from?
Tom says
- April 2019 at 20:39 Hallo Martin,
the answer is not 1 or 0, that also comes out in the article. You also can’t rely on AdPlanner data. Sistrix also gets data from customers who have linked the Search Console data there, since you can see your page’s impressions for a keyword. But of course, all this is not for every keyword. And that’s why inaccuracies come about.
BG
Tom