
Slowly, R is making its way into the world of SEO, and while R may be a bit confusing at first (functional programming instead of procedural), you can build cool stuff with just a few lines of code. As an example, a free SEO monitoring should serve here, which of course can’t keep up with Sistrix and Co at all, but if you only want to track your own rankings, then this is a great and, above all, free solution.
Let’s start with the infrastructure, we only need three components:
- One (free) EC2 instance
- A Google Webmaster Account
- A Google API service account
Amazon offers EC2 instances in the “free tier”, after 12 months a fee is due, but it is more in the homeopathic area. The t2.micro instance is relatively weak on the chest with its one vCPU, 1GB RAM and 30GB SSD, but for our purposes it is perfectly sufficient. R is of course not there from the outset, BUT Luis Aslett offers free AMIs (Amazon Machine Images) where RStudio Server is already pre-configured. Matt explains very well how to use these AMIs to install your own RStudio instance on AWS. All this takes a maximum of 15 minutes, and you have your own free RStudio Server computer in the AWS cloud. Large calculations are not possible with it, but once you get a taste for it, you quickly find yourself in the situation that an instance with a lot of memory is used for larger computing tasks. One click, a few euros a day, and you have a whole cluster with 16 processors and 128 GB of RAM for yourself. But I digress.
In the next step, we’ll take Mark Edmonson’s R package searchConsoleR. This is the elementary cornerstone of our SEO monitoring. In Mark’s example, he simply writes the data to disk, but we prefer to write the data to a database (how to install MySQL on our newly acquired EC2 instance is here. Please note that you only have one user “ubuntu”, i.e. you have to do everything with sudo; otherwise you can also book an RDS instance for a fee). In order to access the Webmaster Console data from a server, a service account is required. This is not quite so easy to set up, but that would go beyond the scope of this article. It is important that the email address of the service account is entered as a full user in the Webmaster Console. And here is the code:
library(jsonlite) library(googleAuthR) library(searchConsoleR) library(RMySQL) options(googleAuthR.scopes.selected=”https://www.googleapis.com/auth/webmasters”)’ gar_auth_service( json_file = “/home/rstudio/XXXXXXX.json”, scope = “https://www.googleapis.com/auth/webmasters” )
We’ll get the data from 5 days ago, then it’s definitely available in the Webmaster Console:
delay <- 5 start <- Sys.Date() – delay end <- Sys.Date() – delay
Here is the query:
website <;- “XXXXX” download_dimensions <- c(‘query’,’page’) type <- c(‘web’) sc_data <- search_analytics(siteURL = website, startDate = start, endDate = end, dimensions = download_dimensions, searchType = type)
We add a date and the website (if we query several websites)
sc_data7 <- Website sc_data8 <- Start colnames(sc_data)7 <- “Website” colnames(sc_data)8 <- “Date”
Now we write the dataframe to the database (we already added the DB and the table earlier):
mydb = dbConnect(MySQL(), user=’XXXXX’, password=’XXXXX’, host=’XXXXX’) dbSendQuery(mydb, “USE XXXXX”) dbWriteTable(mydb, value = sc_data, name = “rankings”, append = TRUE, row.names = FALSE) dbDisconnect(mydb)
Now all we need is a cron job that performs this query every day. To do this, I first use a small shell script, which then calls the R script:
cd /home/rstudio/SearchConsole/ Rscript /home/rstudio/SearchConsole/rankings. R
Then the cronjob is set up, my cronjob starts every day at 10:30, or 11:30, because the instance is in a different time zone:sudo crontab -e
30 10 * * * /home/rstudio/SearchConsole/rankings.sh 2>&1 | /usr/bin/logger -t rstudio
Important: Newline after the last line, otherwise cron will complain. The automatic and free SEO monitoring is ready! In one of the next posts, I will show how the data can then be evaluated.