Visualizing overlaps of ETFs in an UpSet diagram

Today, two topics I find particularly exciting come together: data analysis and visualization, and finance. Choosing the right ETFs is a topic that fills countless web pages and financial magazine articles. However, it’s equally fascinating to explore the overlaps between ETFs. Previously, I compared the Vanguard FTSE All-World High Dividend Yield UCITS ETF USD Distributing (ISIN: IE00B8GKDB10) and the iShares STOXX Global Select Dividend 100 UCITS (ISIN: DE000A0F5UH1). I also analyzed the performance of these two alongside the VanEck Morningstar Developed Markets Dividend Leaders ETF (NL0011683594) and an MSCI World ETF (IE00B4L5Y983).

The holdings included in an ETF can be downloaded from the respective provider’s website; I performed this download on October 5. The data requires significant transformation before it can be compared. My R-based notebook detailing this process can be found [here]. For the visualization, I chose an UpSet diagram, a relatively new type of visualization that I’ve used in a paper and another project. While Venn diagrams are commonly used for visualizing overlaps between datasets, they become unwieldy with more than 3 or 4 datasets. This challenge is clearly illustrated in examples like this:

The size of the circles, for example, does not necessarily reflect the size of the datasets. An UpSet diagram is entirely different:

Yes, it takes a bit of effort, but it shows much more clearly how the datasets relate to one another. On the far left, we see the size of the datasets, with the Vanguard FTSE All-World High Dividend Yield having the most holdings—over 2,000. On the right-hand side, we see the overlaps. The point at the very bottom beneath the tallest vertical bar indicates that the Vanguard FTSE […] has 1,376 stocks that no other ETF includes. Similarly, the iShares Core MSCI World has 757 titles that no other ETF contains. In the third column, we see that these two ETFs share 486 titles that the other two ETFs do not include. I find that quite fascinating. For example, I wouldn’t have thought that the Vanguard contains so many stocks that the MSCI World does not.

The VanEck allegedly has one stock that no other ETF contains, but that’s not accurate; that entry was just cash. Otherwise, 81 of its 100 titles are also included in the MSCI World. All of its titles are included in the Vanguard.

It would now be interesting to see how the weightings align. However, that’s an additional dimension that would likely be difficult to represent in an UpSet diagram. Still, it’s necessary to take a closer look at this because the overlaps might result in unintended overweighting of certain stocks. That would be a topic for the next blog post.

New tool for a dividend strategy.

Some tools online offer the ability to see how many dividends are likely to come your way. For example, extraETF provides a tool where you can see what the dividends might look like based on an assumed growth rate (CAGR), a certain number of years, and asset gains.

What I haven’t seen so far is a tool that, starting from a portfolio, calculates the dividend growth based on an assumed CAGR and dividend yield, while also factoring in taxes. That’s exactly the kind of tool I’ve created.

ggplot2 and the New Pipe

Why doesn’t this code work?

mtcars |> ggplot(., aes(x = mpg, y = hp)) + geom_point()

The problem with the code above lies in the use of the pipe operator (|>), right before ggplot. ggplot2 is not natively supported with the R-specific pipe (|>), as used here. However, ggplot2 works seamlessly with the Magrittr pipe (%>%) from the dplyr package. Here is the correct usage:

library(ggplot2)
library(dplyr)

mtcars %>%
ggplot(aes(x = mpg, y = hp)) +
geom_point()

Alternatively, the data must be explicitly passed to ggplot, as shown here:

library(ggplot2)

mtcars |>
ggplot(data = ., aes(x = mpg, y = hp)) +
geom_point()

Here, the dot (.) represents the data being piped from mtcars into ggplot, and you need to specify it as the data argument in the ggplot function.

The Digital Analytics Association is history – and no one cares.

It was a bit surprising. I had recently emailed with Jim Sterne when it came to the German branch. The DAA had also contributed a foreword to my web analytics book. It’s a bit of a shame.

For those who don’t know: The DAA was previously the WAA, the Web Analytics Association, and it created the most widely used definition of web analytics. Although that definition has long been missing from the website, most researchers who copy quotes from other papers didn’t seem to care.

But how is it possible that such an organization, despite the importance of data, is shutting down? It could be, for example, because many have installed Google Analytics & Co., but the data is not actually being used. In my last paper, which unfortunately isn’t public yet, it was found that most users don’t even realize that embedding the GA code alone isn’t enough to work data-driven. And maybe it’s also a bit due to the DAA itself, that it didn’t manage to make its relevance clear.

I had only been a member out of nostalgia in recent years. I had used my student status to lower the membership fees a bit.

The website is already no longer accessible.

From WordPress to Hugo and back again

Three years ago, for the 15th anniversary of this blog, I moved from WordPress to Hugo. Super-fast pages, everything in R, actually a cool thing. But in reality, it wasn’t that great. I always needed an R environment, which I didn’t always have. Git drove me crazy at times. And some problems were just impossible to troubleshoot. So now I’ve moved back again. Maybe the rankings will return as well, which I lost after the move.

Artificial Intelligence, Machine Learning, Data Science, Data Mining, and Statistics… what is the difference?


Artificial Intelligence (AI)

Artificial Intelligence refers to the broad field of computer science that enables machines to perform tasks that typically require human intelligence. An example in marketing is the development of intelligent chatbots that automatically answer customer inquiries.

Machine Learning

Machine Learning is a subset of AI that enables machines to learn from data and adapt without being explicitly programmed. In marketing, Machine Learning is used, for example, to predict customer trends and create personalized advertising content.

Data Mining

Data Mining is the process of discovering patterns in large datasets. It is an important part of Data Science and is used in marketing to identify customer segments and better understand target audiences.

Data Science

Data Science is the field that combines techniques from statistics, Machine Learning, and data analysis to extract insights from data.

Statistics

Statistics is the foundation for Data Science and Machine Learning. It deals with methods for analyzing and interpreting data. In the marketing context, statistics is used to analyze customer trends and test hypotheses, such as in A/B testing. Some people claim that Data Science is just statistics in a new guise. However, Data Science is more of a combination of statistics and computer science, as it also involves working with large datasets.

Overlaps and Differences

  • Overlaps: Machine Learning is a subset of AI and is applied in Data Science. Both Data Mining and Machine Learning use statistical methods.
  • Differences: While AI is a broad field with various applications, Machine Learning specifically focuses on learning from data. Data Science combines these techniques to derive data-driven insights.

How do minimalism and Apple products go together, when Apple is so expensive?


I have been using Apple products almost exclusively since the mid-90s. Now and then, I engage in debates about the pros and cons of Apple products compared to their competitors, especially regarding the price difference. And of course, the question arises whether minimalism and using Apple products even go together. It creates an ambivalence between design culture and the contradiction of consumption.

Continue reading “How do minimalism and Apple products go together, when Apple is so expensive?”

Eternal November: Will Mastodon Suffer the Same Fate as Usenet?


Mastodon and the Fediverse had maintained a niche existence for many years until they were thrust into the spotlight by Musk’s acquisition of Twitter and the ensuing turbulence. Since then, the Mastodon community has not been growing like a hockey stick, as it’s called in investor jargon, but like a rocket. This is a big win for those who champion open-source principles. However, this rapid growth might also become a curse, and for several reasons.

Continue reading “Eternal November: Will Mastodon Suffer the Same Fate as Usenet?”