Kategorie: Data Science

Data Science meets SEO, Teil 5

Der letzte Teil dieser Serie über Suchmaschinenoptimierung/SEO und Data Science auf Basis meines Vortrags bei der SEO Campixx. Die Daten und den Code habe ich via Knit in ein HTML-Dokument überführt, das mein Notebook inklusive Daten nachvollziehbar macht. In dem Notebook sind auch ein paar mehr Untersuchungen drin, allerdings habe ich alles auf Englisch dokumentiert, da dies nicht nur für Deutsche interessant ist. Wer also alle Ergebnisse in einem Dokument lesen möchte (ohne die TF/IDF, WDF/IDF oder Stemming-Beispiele), schaut sich bitte das Data Science & SEO Notebook an. Speed und andere Faktoren sind in den vorherigen Teilen zu lesen.Continue reading

Filed under: Data ScienceTagged with: , , ,

Goals, KPIs and Metrics

In order to understand the business problem, it is necessary to understand the goal of a project or a business (the latter is easy, it is always about making money). As a consequence, there is a hierarchy consisting of

  1. Business Goals
  2. Goals
  3. KPIs
  4. Metrics

A goal is where you want to go or what you want to achieve. It is important to define SMART goals, where goals are specific, measureable, attainable, realistic and timeable:

  1. Business Goal: Revenue of 1.000.000€ in 2018
  2. Goals:
    1. Reduce the Cost per Order to 5€ by Q3/2018
    2. Increase Website Conversion Rate to 2% by Q3/2018
    3. Increase qualified visits to 10.000 unique daily visitors in Q2/2018
  3. KPIs:
    1. CPO
    2. CVR
    3. qualified unique visitors/day
  4. Metrics
    1. %
    2. Sum/day

Goals, KPIs and Metrics are often confused (and everytime this happens, a sweet beautiful unicorn dies). A good way to remember the difference is a journey:

  • You want to go drive from Hannover to Hamburg and arrive before 3 pm. This is your goal.
  • KPIs are time, speed and maybe GPS coordinates (are you on the right way)
  • Metrics are km/h and time

Sometimes, only KPIs are defined but no goals or KPIs are disconnected from goals. In this case, all numbers are great because there is no goal. Or, as Seneca said, If one does not know to which port one is sailing, no wind is favorable.

Next: The Holy Trinity of Data

Filed under: Data Science

An (Online) Marketing Example

A typical example of a business problem that can be solved using data is found in (online) marketing. Marketeers always want to know how they can make sure that they reach the right audience and that they can get the biggest bang for the buck, i.e. that they spend less money on marketing in order to increase their margins.

Several customer journey models exist, one of the simplest being AIDA (Awareness, Interest, Desire, Action) where customers are “driven” through a funnel from a first touch point where consumers learn about the product to the last touch point where they buy it.

Several marketing channels exist, and we will go through a few online channels:

  • Organic Search: These are the search results that are shown below paid ads if there are any. In order to rank on Google, search optimization techniques are applied, but it can take from a few minutes to several months until a page is listed. Also, there is no guarantee that a site is listed on the first results page where the majority of users will be.
  • Paid Search: Search ads are sold in a realtime auction where advertisers bid for their ad to be displayed for a specific keyword
  • Display Advertising: Can be booked on a CPM basis or in an auction, usually in programmatic advertising where advertisers bid on an impression, based on the data they have about the user this impression would be shown to
  • Affiliate Marketing: Other websites sell a product and receive a commission
  • Social such as FB: Works similar to Paid Search, using an auction to determine the price

Since not every website visitor will buy a product, the cost of acquiring a customer is determined by the conversion rate of the website and the cost per click or cost per mille. If the CVR is at 1% and a CPC at 0.10€, it will cost 10€ to acquire the customer to do a first purchase (cost per order). Sometimes, it is ok to spend more on CPO given that the customer may come back and purchase more.

Online marketing is just one area where data science can be applied, and it is actually questionable why a huge proportion of intellectual resources is applied to the question how more consumer goods can be advertised and sold.

A few important terms:

CPC Cost per Click
CPO Cost per Order
CPA Cost per Acquisition
SEO Search Engine Optimisation
SEA Search Engine Advertising
CLV Customer Lifetime Value
CVR Conversion Rate
CPM Cost per Mille
Filed under: Data Science

R: dplyr/sparklyr vs data.table Performance

In their 2017 book “R for Data Science“, Grolemund and Wickham state that data.table is recommended instead of dplyr when working with larger datasets (10 to 100 Gb) on a regular basis. Having started with Wickhams sparklyr (R’s interface to Spark using the dplyr dialect), I was wondering how much faster data.table actually is. This is not the most professional benchmark given that I just compare system time before and after the script ran but it gives an indication of the advantages and disadvantages of each approach.

Continue reading

Filed under: Data ScienceTagged with: , , , , ,


While the initial reaction to the question “What do our users/customers want?” often is “Let’s ask them”, this is a difficult task, and it is rarely done well. It is easy to do a survey from a technical perspective, and several free solutions exist. However, asking the right questions to the right people is extremely difficult.

Filed under: Data Science

R ist wie Rauchen

“Using R is a bit akin to smoking. The beginning is difficult, one may get headaches and even gag the first few times. But in the long run,it becomes pleasurable and even addictive. Yet, deep down, for those willing to be honest, there is something not fully healthy in it.”

Francois Pinard

Filed under: Data Science

Data Science meets SEO, Teil 3

In den ersten beiden Teilen ging es darum, was Data Science überhaupt ist und warum WDF/IDF-Werte sehr wahrscheinlich wenig mit dem zu tun haben, was bei Google unter der Motorhaube passiert. In diesem Teil geht es einen Schritt weiter, wir schauen nämlich, ob es Korrelationen zwischen Ranking Signalen und der Position gibt. Im Vortrag hatte ich das am Beispiel einer Suchanfrage gezeigt und angesichts der zur Verfügung stehenden Zeit auch eher kurz abgehandelt. Hier kann ich in die Tiefe gehen. Wir schauen uns hierbei allerdings erst einmal nur jedes einzelne Rankingsignal in Bezug auf die Positon an, nicht die eventuell vorhandene Wirkung der Rankingsignale untereinander.Continue reading

Filed under: Data ScienceTagged with: , , ,

Data Science meets SEO, Teil 2

Nachdem ich im ersten Teil erklärt habe, was Data Science ist und was es in diesem Bereich schon zum Thema SEO gibt, nun der zweite Teil, wo wir uns etwas genauer damit beschäftigen, was die linguistische Verarbeitung eines Dokuments durch eine Suchmaschine für eine Auswirkung auf SEO-Konzepte wie Keyword Density, TF/IDF und WDF/IDF hat. Da ich auf der SEO Campixx live Code gezeigt habe, biete ich hier alles zum Download an, was das Nachvollziehen der Beispiele noch erlebnisreicher macht 🙂 Das geht übrigens auch ohne die Installation von R, hier ist der komplette Code mit Erklärungen und Ergebnissen zu finden.Continue reading

Filed under: Data ScienceTagged with: , , ,