Today, two topics I find particularly exciting come together: data analysis and visualization, and finance. Choosing the right ETFs is a topic that fills countless web pages and financial magazine articles. However, it’s equally fascinating to explore the overlaps between ETFs. Previously, I compared the Vanguard FTSE All-World High Dividend Yield UCITS ETF USD Distributing (ISIN: IE00B8GKDB10) and the iShares STOXX Global Select Dividend 100 UCITS (ISIN: DE000A0F5UH1). I also analyzed the performance of these two alongside the VanEck Morningstar Developed Markets Dividend Leaders ETF (NL0011683594) and an MSCI World ETF (IE00B4L5Y983).
The holdings included in an ETF can be downloaded from the respective provider’s website; I performed this download on October 5. The data requires significant transformation before it can be compared. My R-based notebook detailing this process can be found [here]. For the visualization, I chose an UpSet diagram, a relatively new type of visualization that I’ve used in a paper and another project. While Venn diagrams are commonly used for visualizing overlaps between datasets, they become unwieldy with more than 3 or 4 datasets. This challenge is clearly illustrated in examples like this:
The size of the circles, for example, does not necessarily reflect the size of the datasets. An UpSet diagram is entirely different:
Yes, it takes a bit of effort, but it shows much more clearly how the datasets relate to one another. On the far left, we see the size of the datasets, with the Vanguard FTSE All-World High Dividend Yield having the most holdings—over 2,000. On the right-hand side, we see the overlaps. The point at the very bottom beneath the tallest vertical bar indicates that the Vanguard FTSE […] has 1,376 stocks that no other ETF includes. Similarly, the iShares Core MSCI World has 757 titles that no other ETF contains. In the third column, we see that these two ETFs share 486 titles that the other two ETFs do not include. I find that quite fascinating. For example, I wouldn’t have thought that the Vanguard contains so many stocks that the MSCI World does not.
The VanEck allegedly has one stock that no other ETF contains, but that’s not accurate; that entry was just cash. Otherwise, 81 of its 100 titles are also included in the MSCI World. All of its titles are included in the Vanguard.
It would now be interesting to see how the weightings align. However, that’s an additional dimension that would likely be difficult to represent in an UpSet diagram. Still, it’s necessary to take a closer look at this because the overlaps might result in unintended overweighting of certain stocks. That would be a topic for the next blog post.
Depot student Dominik has already provided a good overview of how to export data from the ING depot via the ExtraETF workaround. However, not every tool can handle the CSV export properly. For example, DivvyDiary immediately recognized the relevant columns, but the balances didn’t match. The reason for this is that CSV files can vary significantly, as can the data within them. Sometimes, columns aren’t separated by a comma but by a semicolon. And while the difference between 1,000.00 and 1.000,00 might seem minor to us, for DivvyDiary, a 1000 turned into a 1 because the thousands separator was treated as a decimal point.
The solution: As much as I dislike working with Excel, if you open the CSV file in Excel and then save it again as a CSV, even DivvyDiary (and many other tools) can handle it.
In fact, the first project has now been partially collected, with €262.72 of the €500 returned.
Otherwise, not much has changed since my last update. However, I decided to invest again. I withdrew 2/5 of my investment amount, and while the remaining amount is still significant, it no longer makes up too large a portion of my portfolio. I also put a stop to the reinvestment, at least as best as I could. Unfortunately, at Estateguru, you can’t specify that you want to invest only a certain amount, so that the interest is always available. Instead, you can only specify that you want to keep a certain amount invested, which isn’t ideal. For example, if I had invested €10,000 and wanted to reserve €100 each month, but then received a payment of €300, I would quickly end up investing more than €10,000, even though that’s not what I wanted. The support wasn’t particularly helpful in this case.
Estateguru does offer the option to download your portfolio data, which allows me to take a closer look at what actually went wrong:
data %>% group_by(Country, Status) %>% ggplot(., aes(fill = Status, x = as.factor(Country))) + geom_bar() + theme_minimal() + xlab("Land")
As you can see here, I had fewer projects in Germany than in Estonia, for example, but most of the projects in Germany have defaulted. That’s quite alarming. It looks even worse when you look at the actual amounts.
data %>% group_by(Country, Status) %>% ggplot(., aes(fill = Status, y = `Initial Principal`, x = as.factor(Country))) + geom_bar(stat = "identity") + theme_minimal() + xlab("Land") + ylab("Darlehensbetrag in Euro")
Is there a correlation between the interest rate and the “defaulted” status, meaning did I take on riskier loans in my “greed” that were characterized by higher interest rates? Let’s first visualize several variables:
We definitely see the one outlier where I invested €2,500 at around 11%. It also seems that defaults are primarily associated with higher interest rates, except for the German projects, where I have defaults across the board. However, it doesn’t quite fit, because in some projects, you could invest over multiple stages:
Apparently, I was particularly bold in Germany, thinking that loans there were safer, and as a result, I repeatedly exceeded the limit of €500 per project that I had set for myself. Calculating a statistically significant difference would be the task. But let’s start in a different way first:
data %>%
filter(Status == "Repaid" | Status == "In Default") %>%
group_by(Status) %>%
summarize(mean_interest = mean(`Interest Rate`), median_interest = median(`Interest Rate`))
## # A tibble: 2 × 3
## Status mean_interest median_interest
## <chr> <dbl> <dbl>
## 1 In Default 10.7 10.5
## 2 Repaid 10.0 10
The Shapiro-Wilk test helps us check the normality of the data.
repaid <- filter(data, Status == "Repaid") In fact, the first project has now been partially collected, with €262.72 of the €500 returned.
Estateguru: Project partially recovered. Otherwise, not much has changed since my last update. However, I’ve decided to invest again. I withdrew 2/5 of my investment amount, and while the remaining amount is still substantial, it no longer makes up too large a portion of my portfolio. Additionally, I’ve put a stop to the reinvestment, at least as best as I could. Unfortunately, with Estateguru, you can’t specify that you only want to invest a certain amount so that the interest is always available. Instead, you can specify that you want to keep a certain amount invested, which isn’t ideal. For example, if I had invested €10,000 and wanted to reserve €100 every month, but then received a payment of €300, I would quickly end up investing more than €10,000, even though I didn’t want that. The support wasn’t very helpful in this regard.
Estateguru offers the option to download your portfolio data, which allows me to take a closer look at what actually went wrong:
data %>% group_by(Country, Status) %>% ggplot(., aes(fill = Status, x = as.factor(Country))) + geom_bar() + theme_minimal() + xlab("Land")
As you can clearly see here, I had fewer projects in Germany than in Estonia, for example, but most of the projects in Germany have defaulted. That's quite alarming. It looks even worse when you look at the actual amounts.
data %>% group_by(Country, Status) %>% ggplot(., aes(fill = Status, y = `Initial Principal`, x = as.factor(Country))) + geom_bar(stat = "identity") + theme_minimal() + xlab("Land") + ylab("Darlehensbetrag in Euro")
Is there a correlation between the interest rate and the "defaulted" status, meaning did I take on riskier loans in my "greed" that were characterized by higher interest rates? Let's first visualize several variables:
We definitely see the one outlier where I invested €2,500 at around 11%. It also appears that defaults are primarily associated with higher interest rates, except for the German projects, where I have defaults across the board. However, this doesn't quite fit, because in some projects, you could invest over multiple stages:
Apparently, I was particularly bold in Germany, thinking that loans there were safer, and as a result, I repeatedly exceeded the €500 limit I had set for myself per project. Calculating a statistically significant difference would be the task. But let's start in a different way first:
data %>% filter(Status == "Repaid" | Status == "In Default") %>% group_by(Status) %>% summarize(mean_interest = mean(`Interest Rate`), median_interest = median(`Interest Rate`))
## # A tibble: 2 × 3 ## Status mean_interest median_interest ## <chr> <dbl> <dbl> ## 1 In Default 10.7 10.5 ## 2 Repaid 10.0 10
The Shapiro-Wilk test helps us check the normality of the data.
Interest Rate`in_default <- filter(data, Status == "In Default")
In fact, the first project has now been partially recovered, with €262.72 of the €500 returned.
Estateguru: Project partially recovered.
Otherwise, not much has changed since my last update. However, I’ve decided to invest again. I withdrew 2/5 of my investment, and while the remaining amount is still significant, it no longer makes up too large a portion of my portfolio. Additionally, I’ve put a stop to the reinvestment, at least as best as I could. Unfortunately, with Estateguru, you can’t specify that you only want to invest a certain amount, so that the interest is always available. Instead, you can only set a certain amount to be always invested, which isn’t ideal. For example, if I had invested €10,000 and wanted to reserve €100 every month, but then received a payment of €300, I would quickly end up investing more than €10,000, even though that’s not what I wanted. The support wasn’t very helpful in this case.
Estateguru offers the option to download your portfolio data, which allows me to take a closer look at what actually went wrong:
data %>% group_by(Country, Status) %>% ggplot(., aes(fill = Status, x = as.factor(Country))) + geom_bar() + theme_minimal() + xlab("Land")
As you can clearly see here, I had fewer projects in Germany than in Estonia, for example, but most of the projects in Germany have defaulted. That’s quite alarming. It looks even worse when you look at the actual amounts.
data %>% group_by(Country, Status) %>% ggplot(., aes(fill = Status, y = `Initial Principal`, x = as.factor(Country))) + geom_bar(stat = "identity") + theme_minimal() + xlab("Land") + ylab("Darlehensbetrag in Euro")
Is there a correlation between the interest rate and the "defaulted" status, meaning did I take on riskier loans in my "greed" that were characterized by higher interest rates? Let's first visualize several variables:
We definitely see the one outlier where I invested €2,500 at around 11%. It also seems that defaults are primarily associated with higher interest rates, except for the German projects, where I have defaults across the board. However, this doesn’t quite fit, because in some projects, you could invest over multiple stages:
Apparently, I was particularly bold in Germany, thinking that loans there were safer, and as a result, I repeatedly exceeded the €500 limit I had set for myself per project. Calculating a statistically significant difference would be the task. But let's start in a different way first:
data %>% filter(Status == "Repaid" | Status == "In Default") %>% group_by(Status) %>% summarize(mean_interest = mean(`Interest Rate`), median_interest = median(`Interest Rate`))
## # A tibble: 2 × 3 ## Status mean_interest median_interest ## <chr> <dbl> <dbl> ## 1 In Default 10.7 10.5 ## 2 Repaid 10.0 10
The Shapiro-Wilk test helps us check the normality of the data.
cat("P-value for Repaid group:", shapiro_test_repaid$p.value, "\n")
## P-value for Repaid group: 1.143358e-08
cat("P-value for In Default group:", shapiro_test_in_default$p.value, "\n")
## P-value for In Default group: 6.078673e-05
The p-values are significant (below 0.05), indicating that the data is not normally distributed. Therefore, the Mann-Whitney U test is used, a non-parametric test, to compare the interest rates of the two groups.
wilcox_test <- wilcox.test(repaid, in_default, alternative = "two.sided") cat("P-value for Mann-Whitney U test:", wilcox_test$p.value, "\n")
## P-value for Mann-Whitney U test: 6.66547e-08
The p-value is significant, meaning it's below 0.05, indicating that there is a significant difference in interest rates between repaid and defaulted loans. This analysis was done across the entire portfolio. Now, how does this look by country?
countries <- unique(data$Country)
# Function to analyze each country analyze_country <- function(country) { cat("Analyse für", country, ":\n")
# Filter data by country and status data_df <- data %>% filter(Country == country) %>% filter(Status %in% c("Repaid", "In Default"))
# Check if there is enough data for both categories if (nrow(data_df) > 0 & length(unique(data_df$Status)) > 1) {
cat("Mann-Whitney U Test-Ergebnis: W =", test$statistic, ", p-value =", test$p.value, "\n\n") } else { cat("Nicht genug Daten für die Analyse.\n\n") } }
# Analyze each country for (country in countries) { analyze_country(country) }
## Analyse für Estonia : ## Mann-Whitney U Test-Ergebnis: W = 77 , p-value = 0.02871484 ## ## Analyse für Germany : ## Mann-Whitney U Test-Ergebnis: W = 101 , p-value = 0.5058534 ## ## Analyse für Lithuania : ## Mann-Whitney U Test-Ergebnis: W = 224.5 , p-value = 3.126943e-06 ## ## Analyse für Finland : ## Mann-Whitney U Test-Ergebnis: W = 54 , p-value = 0.8649381 ## ## Analyse für Spain : ## Nicht genug Daten für die Analyse. ## ## Analyse für Portugal : ## Nicht genug Daten für die Analyse. ## ## Analyse für Latvia : ## Mann-Whitney U Test-Ergebnis: W = 12 , p-value = 0.04383209
In fact, the difference in Germany is not significant. So, it turns out I wasn't as greedy as I thought after all 🙂
Now, what if I had only invested €50 in each loan instead of sometimes investing much more? How would I be doing today?
It's very clear here that my strategy of spending more on certain projects didn't work out well. It would have been better to invest more evenly and diversify my investments. That's exactly what I'm doing differently now.
October was basically a good month. I bought a T-shirt for my youngest, a party barricade tape, but unfortunately, I ended up buying a new iPhone. I was actually very happy with my switch from the Max Pro to the Mini, but the poor quality of the camera bothered me a lot. In September, I was in Padua and had a rare opportunity to photograph the anatomical theater. Unfortunately, it was very dark there, and the photos turned out terribly. Was it an absolutely necessary expense? No.
August was a moderately successful month. My purchases:
A bike saddlebag with tools for 18 euros. You can’t find something like this used.
Four Wi-Fi controllable energy-saving power strips, which are also unavailable used, for about 50 euros.
A wooden A6 index card box for my Luhmann note-box, for around 50 euros. I could have gotten something like this used, but the few suitable boxes were already quite damaged.
It’s sad because I once had such an index card box, but I gave it up after university. I don’t even know what happened to it. I will think more about the note-box system.
The July report went okay. I was really proud of myself for resisting a temptation and not making an impulse purchase, even though it seemed like a good deal. I thought about it for more than a week, and in the end, I did go through with it, but very carefully. It’s about a new phone, where I swapped a flagship model for one that’s a few numbers smaller. I got more money for my 1-year-old phone than I paid for the new one. Why did I do this? Because the huge phone was just too much of a burden. With a smaller phone, it’s not as pleasant for typing and reading, but I’m trying to spend less time on my phone anyway. I tried to find a used model, but wasn’t successful. Apparently, small phones are quite in demand. Instead of carrying around 240 grams, I now only carry 140 grams (yes, you notice), and my pockets don’t bulge as much. My cost per use for the old phone is under 1 euro per day, which I think is fair.
We also bought an extension for our Rams shelf. Again, it was hard to find a used one. My preference was to downsize even more and need less storage, but in the end, we found a compromise. This is also a good example that the things we own not only have their own price but also ongoing costs. The Vitsoe 606 is fairly stable in value, so the cost per use is minimal.
Other than that, I’ve simplified a lot. Ended subscriptions. Looked at whether I could live with alternatives. I canceled Netflix since we barely used it anyway. I’ll also cancel my beloved Headspace, because Apple now offers meditation (though I really dislike the music they use). I’ve parted with old baggage, like consolidating all my domains to a cheaper host. A few more used vinyl records came into my life, which I’ll continue to indulge in as a luxury. But I’ve set a monthly limit for this so it doesn’t get out of hand.
Actually, I bought nothing except a few used vinyl records (some real bargains) and a bike bag. For the latter, I tried to find a used one, but I couldn’t agree on a price with any sellers on eBay. Some of them wanted to charge almost the price of a new bag for worn-out ones, without the very practical mounts that are available today. I got burned by the offers from Valkental and 2bag. Both do great marketing, but the Valkental bag lasted 5 minutes on the bike before the mount broke, and 2bag simply couldn’t deliver.
My “slip-up” from January and the synthesizer I bought in April are listed on eBay.
The month of May went very well, except for one expensive purchase that I couldn’t avoid: I lost my glasses. No idea how I managed that, but I had to buy a new pair. It was very expensive 🙁 However, since it’s not a luxury item or a consumable, I’m not counting it towards this project.
Additionally, I bought a music stand, which I couldn’t find used, at least not the way I wanted it. That’s it. The cost was 16.99 euros.
A few used vinyl records also came my way, but I’m not counting those towards this project either.
April was somewhat “meh” in terms of success. I got rid of a lot of things, but I also made a new purchase that I already suspected was a mistake: a synthesizer. It was only available used at crazy prices, and I mainly wanted it because it has a built-in vocoder. However, it’s extremely complicated to use. I’ve only used it three times. A typical case of “falling in love with a piece because others are making cool things with it, planning to do a lot with it, and then hardly using it.” I need to figure out how to make more music with it. Otherwise, the cost per use is too high.
On the used side, I bought a well-preserved Technics 1210 MK II, and I sold my “old” turntable to make the swap. The upgrade of my setup continues. The Technics is much better, as the NAD 558 can’t simply be switched to 45 RPM. You have to remove the platter (!!!) and adjust the belt. The Technics was always my dream turntable, and even though it’s much “bulkier” than the fragile, design-focused NAD, I know it will last a lifetime. For me, this is another example of how I should have just bought the Technics right away, because now I ended up spending more money. I didn’t get quite as much for the NAD as I spent on it, but I didn’t lose too much either. I estimate I paid about €1 per use. It’s okay, but not great.
I also bought this (new) album by Sparks after attending a concert in April.
March was not a good month for my project. On one hand, I bought a battery-operated digital radio (and a used analog one) because I couldn’t find a used one. Given the current situation, that might still be understandable. However, less understandable is the purchase of a new amplifier and CD player. There’s a little story behind this, and it’s related to the fact that I (finally) cleared out the basement:
March was not a good month for my project. On the one hand, I bought a battery-powered digital radio (and a used analog one) because I couldn’t find a used one. Given the current situation, this is probably still understandable. However, what’s less understandable is the purchase of a new amplifier and CD player. There’s a little backstory to this, which is tied to the fact that I (finally) cleared out the basement:
As you can see, it was a very successful decluttering session, and while doing so, I came across my CDs. Does it make sense to have them in the basement? No. What do I want with them down there? Hope that they’ll eventually be worth more money? Some of them really hold great memories. I’m not entirely sure that I’ll still listen to Nick Cave with The Birthday Party, but occasionally…
Well, I no longer have a CD player (which makes having the CDs in the basement even more ridiculous). A year ago, I bought a new turntable from a specialist shop (I had no idea and wanted advice) and retrieved my vinyl records from the basement. However, I wasn’t entirely happy with the amplifier I bought with it (a NAD Amp 1). So, in the living room, I ended up with two large Apple HomePods for streaming music, Apple TV, two Hi-Fi speakers for vinyl, plus the unloved amplifier and the turntable.
The turntable seemed like a good idea because Vodafone had left us offline for at least three days twice last quarter, and fairy tale records still fetch a lot of money on eBay. The Apple HomePods weren’t much use for that. However, Apple TV didn’t really work well with the amplifier either. In total, it was too much clutter and hassle. I wanted less. Back to the store. Told them about my “problems.” I ended up choosing a more expensive device that could do everything I wanted and would give me peace of mind for the next 20 years. Sold the Apple HomePods (they went within two days for the price I wanted), and the old amplifier was taken back for its original purchase price (after a year, another reason to buy from a specialist store).
What does this have to do with minimalism? Spending so much money on a Hi-Fi system? Initially, nothing at all. Sure, I could have gone the used route (if I had known more about it), and I could have listened to music with a lesser system. But minimalism doesn’t mean you have to stop enjoying life. On the contrary, music brings me immense joy. Maybe not the piece from The Birthday Party above (though I still think it’s great), but I listen to a lot of music, and sound quality is important to me. Right now, I’m very into Beethoven’s Piano Concerto No. 3. I listen to various interpretations, from Gould/Karajan to Gould/Bernstein to Zacharias/Gewandhausorchester. And yes, a good system makes a difference. And I also learned something here: I thought I could get by with the minimal setup, but in the end, I bought twice (fortunately didn’t spend more thanks to the store’s goodwill). I also learned that there are children’s CDs available at the local public library. The first CD I played on my new CD player was Drache Kokosnuss.
I also received a new book in March. I had ordered it in February 2021, but the author had apparently missed multiple deadlines. I nearly forgot about it. Also, the bike holders I needed for the basement were only available new.
In total, I spent just under €1,000 in March, after deducting the things I sold. Overall, I’m actually quite satisfied with that.