Estateguru: First project partially collected + Exploratory Data Analysis


In fact, the first project has now been partially collected, with €262.72 of the €500 returned.

Otherwise, not much has changed since my last update. However, I decided to invest again. I withdrew 2/5 of my investment amount, and while the remaining amount is still significant, it no longer makes up too large a portion of my portfolio. I also put a stop to the reinvestment, at least as best as I could. Unfortunately, at Estateguru, you can’t specify that you want to invest only a certain amount, so that the interest is always available. Instead, you can only specify that you want to keep a certain amount invested, which isn’t ideal. For example, if I had invested €10,000 and wanted to reserve €100 each month, but then received a payment of €300, I would quickly end up investing more than €10,000, even though that’s not what I wanted. The support wasn’t particularly helpful in this case.

Estateguru does offer the option to download your portfolio data, which allows me to take a closer look at what actually went wrong:

data %>%
group_by(Country, Status) %>%
ggplot(., aes(fill = Status, x = as.factor(Country))) +
geom_bar() +
theme_minimal() +
xlab("Land")

As you can see here, I had fewer projects in Germany than in Estonia, for example, but most of the projects in Germany have defaulted. That’s quite alarming. It looks even worse when you look at the actual amounts.

data %>%
group_by(Country, Status) %>%
ggplot(., aes(fill = Status, y = `Initial Principal`, x = as.factor(Country))) +
geom_bar(stat = "identity") +
theme_minimal() +
xlab("Land") +
ylab("Darlehensbetrag in Euro")

Is there a correlation between the interest rate and the “defaulted” status, meaning did I take on riskier loans in my “greed” that were characterized by higher interest rates? Let’s first visualize several variables:

data %>%
ggplot(., aes(x = `Initial Principal`, y= `Interest Rate`, color = factor(Status))) +
geom_point() +
facet_grid(rows = vars(Country)) +
theme_minimal()

We definitely see the one outlier where I invested €2,500 at around 11%. It also seems that defaults are primarily associated with higher interest rates, except for the German projects, where I have defaults across the board. However, it doesn’t quite fit, because in some projects, you could invest over multiple stages:

data %>%
  mutate(`Loan Code` = str_remove(`Loan Code`, "-.*")) %>%
  group_by(`Loan Code`) %>%
  mutate(principal_complete = sum(`Initial Principal`), median_interest = median(`Interest Rate`)) %>%
  select(`Loan Code`, Status, Country, median_interest, principal_complete) %>%
  arrange(`Loan Code`) %>%
  unique() %>%
  ggplot(., aes(x = principal_complete, y= median_interest, color = factor(Status))) +
  geom_point() +
  facet_grid(rows = vars(Country)) +
  theme_minimal()

Apparently, I was particularly bold in Germany, thinking that loans there were safer, and as a result, I repeatedly exceeded the limit of €500 per project that I had set for myself. Calculating a statistically significant difference would be the task. But let’s start in a different way first:

data %>%
  filter(Status == "Repaid" | Status == "In Default") %>%
  group_by(Status) %>%
  summarize(mean_interest = mean(`Interest Rate`), median_interest = median(`Interest Rate`))
## # A tibble: 2 × 3
##   Status     mean_interest median_interest
##   <chr>              <dbl>           <dbl>
## 1 In Default          10.7            10.5
## 2 Repaid              10.0            10

The Shapiro-Wilk test helps us check the normality of the data.

repaid <- filter(data, Status == "Repaid")
In fact, the first project has now been partially collected, with €262.72 of the €500 returned.

Estateguru: Project partially recovered.
Otherwise, not much has changed since my last update. However, I’ve decided to invest again. I withdrew 2/5 of my investment amount, and while the remaining amount is still substantial, it no longer makes up too large a portion of my portfolio. Additionally, I’ve put a stop to the reinvestment, at least as best as I could. Unfortunately, with Estateguru, you can’t specify that you only want to invest a certain amount so that the interest is always available. Instead, you can specify that you want to keep a certain amount invested, which isn’t ideal. For example, if I had invested €10,000 and wanted to reserve €100 every month, but then received a payment of €300, I would quickly end up investing more than €10,000, even though I didn’t want that. The support wasn’t very helpful in this regard.

Estateguru offers the option to download your portfolio data, which allows me to take a closer look at what actually went wrong:

data %>%
group_by(Country, Status) %>%
ggplot(., aes(fill = Status, x = as.factor(Country))) +
geom_bar() +
theme_minimal() +
xlab("Land")

As you can clearly see here, I had fewer projects in Germany than in Estonia, for example, but most of the projects in Germany have defaulted. That's quite alarming. It looks even worse when you look at the actual amounts.


data %>%
group_by(Country, Status) %>%
ggplot(., aes(fill = Status, y = `Initial Principal`, x = as.factor(Country))) +
geom_bar(stat = "identity") +
theme_minimal() +
xlab("Land") +
ylab("Darlehensbetrag in Euro")

Is there a correlation between the interest rate and the "defaulted" status, meaning did I take on riskier loans in my "greed" that were characterized by higher interest rates? Let's first visualize several variables:

data %>%
ggplot(., aes(x = `Initial Principal`, y= `Interest Rate`, color = factor(Status))) +
geom_point() +
facet_grid(rows = vars(Country)) +
theme_minimal()

We definitely see the one outlier where I invested €2,500 at around 11%. It also appears that defaults are primarily associated with higher interest rates, except for the German projects, where I have defaults across the board. However, this doesn't quite fit, because in some projects, you could invest over multiple stages:

data %>%
mutate(`Loan Code` = str_remove(`Loan Code`, "-.*")) %>%
group_by(`Loan Code`) %>%
mutate(principal_complete = sum(`Initial Principal`), median_interest = median(`Interest Rate`)) %>%
select(`Loan Code`, Status, Country, median_interest, principal_complete) %>%
arrange(`Loan Code`) %>%
unique() %>%
ggplot(., aes(x = principal_complete, y= median_interest, color = factor(Status))) +
geom_point() +
facet_grid(rows = vars(Country)) +
theme_minimal()

Apparently, I was particularly bold in Germany, thinking that loans there were safer, and as a result, I repeatedly exceeded the €500 limit I had set for myself per project. Calculating a statistically significant difference would be the task. But let's start in a different way first:

data %>%
filter(Status == "Repaid" | Status == "In Default") %>%
group_by(Status) %>%
summarize(mean_interest = mean(`Interest Rate`), median_interest = median(`Interest Rate`))

## # A tibble: 2 × 3
## Status mean_interest median_interest
## <chr> <dbl> <dbl>
## 1 In Default 10.7 10.5
## 2 Repaid 10.0 10

The Shapiro-Wilk test helps us check the normality of the data.

Interest Rate`in_default <- filter(data, Status == "In Default")

In fact, the first project has now been partially recovered, with €262.72 of the €500 returned.

Estateguru: Project partially recovered.

Otherwise, not much has changed since my last update. However, I’ve decided to invest again. I withdrew 2/5 of my investment, and while the remaining amount is still significant, it no longer makes up too large a portion of my portfolio. Additionally, I’ve put a stop to the reinvestment, at least as best as I could. Unfortunately, with Estateguru, you can’t specify that you only want to invest a certain amount, so that the interest is always available. Instead, you can only set a certain amount to be always invested, which isn’t ideal. For example, if I had invested €10,000 and wanted to reserve €100 every month, but then received a payment of €300, I would quickly end up investing more than €10,000, even though that’s not what I wanted. The support wasn’t very helpful in this case.

Estateguru offers the option to download your portfolio data, which allows me to take a closer look at what actually went wrong:

data %>%
group_by(Country, Status) %>%
ggplot(., aes(fill = Status, x = as.factor(Country))) +
geom_bar() +
theme_minimal() +
xlab("Land")

As you can clearly see here, I had fewer projects in Germany than in Estonia, for example, but most of the projects in Germany have defaulted. That’s quite alarming. It looks even worse when you look at the actual amounts.

data %>%
group_by(Country, Status) %>%
ggplot(., aes(fill = Status, y = `Initial Principal`, x = as.factor(Country))) +
geom_bar(stat = "identity") +
theme_minimal() +
xlab("Land") +
ylab("Darlehensbetrag in Euro")

Is there a correlation between the interest rate and the "defaulted" status, meaning did I take on riskier loans in my "greed" that were characterized by higher interest rates? Let's first visualize several variables:

data %>%
ggplot(., aes(x = `Initial Principal`, y= `Interest Rate`, color = factor(Status))) +
geom_point() +
facet_grid(rows = vars(Country)) +
theme_minimal()

We definitely see the one outlier where I invested €2,500 at around 11%. It also seems that defaults are primarily associated with higher interest rates, except for the German projects, where I have defaults across the board. However, this doesn’t quite fit, because in some projects, you could invest over multiple stages:

data %>%
mutate(`Loan Code` = str_remove(`Loan Code`, "-.*")) %>%
group_by(`Loan Code`) %>%
mutate(principal_complete = sum(`Initial Principal`), median_interest = median(`Interest Rate`)) %>%
select(`Loan Code`, Status, Country, median_interest, principal_complete) %>%
arrange(`Loan Code`) %>%
unique() %>%
ggplot(., aes(x = principal_complete, y= median_interest, color = factor(Status))) +
geom_point() +
facet_grid(rows = vars(Country)) +
theme_minimal()

Apparently, I was particularly bold in Germany, thinking that loans there were safer, and as a result, I repeatedly exceeded the €500 limit I had set for myself per project. Calculating a statistically significant difference would be the task. But let's start in a different way first:

data %>%
filter(Status == "Repaid" | Status == "In Default") %>%
group_by(Status) %>%
summarize(mean_interest = mean(`Interest Rate`), median_interest = median(`Interest Rate`))

## # A tibble: 2 × 3
## Status mean_interest median_interest
## <chr> <dbl> <dbl>
## 1 In Default 10.7 10.5
## 2 Repaid 10.0 10

The Shapiro-Wilk test helps us check the normality of the data.


Interest Rate`shapiro_test_repaid <- shapiro.test(repaid)

shapiro_test_in_default <- shapiro.test(in_default)

cat("P-value for Repaid group:", shapiro_test_repaid$p.value, "\n")


## P-value for Repaid group: 1.143358e-08

cat("P-value for In Default group:", shapiro_test_in_default$p.value, "\n")

## P-value for In Default group: 6.078673e-05

The p-values are significant (below 0.05), indicating that the data is not normally distributed. Therefore, the Mann-Whitney U test is used, a non-parametric test, to compare the interest rates of the two groups.
wilcox_test <- wilcox.test(repaid, in_default, alternative = "two.sided")
cat("P-value for Mann-Whitney U test:", wilcox_test$p.value, "\n")

## P-value for Mann-Whitney U test: 6.66547e-08

The p-value is significant, meaning it's below 0.05, indicating that there is a significant difference in interest rates between repaid and defaulted loans. This analysis was done across the entire portfolio. Now, how does this look by country?

countries <- unique(data$Country)

# Function to analyze each country
analyze_country <- function(country) {
cat("Analyse für", country, ":\n")

# Filter data by country and status
data_df <- data %>% filter(Country == country) %>% filter(Status %in% c("Repaid", "In Default"))

# Check if there is enough data for both categories
if (nrow(data_df) > 0 & length(unique(data_df$Status)) > 1) {

repaid <- data_df %>% filter(Status == "Repaid") %>% select(`Interest Rate`) %>% unlist()
in_default <- data_df %>% filter(Status == "In Default") %>% select(`Interest Rate`) %>% unlist()
test <- wilcox.test(repaid, in_default, exact = FALSE)

cat("Mann-Whitney U Test-Ergebnis: W =", test$statistic, ", p-value =", test$p.value, "\n\n")
} else {
cat("Nicht genug Daten für die Analyse.\n\n")
}
}

# Analyze each country
for (country in countries) {
analyze_country(country)
}

## Analyse für Estonia :
## Mann-Whitney U Test-Ergebnis: W = 77 , p-value = 0.02871484
##
## Analyse für Germany :
## Mann-Whitney U Test-Ergebnis: W = 101 , p-value = 0.5058534
##
## Analyse für Lithuania :
## Mann-Whitney U Test-Ergebnis: W = 224.5 , p-value = 3.126943e-06
##
## Analyse für Finland :
## Mann-Whitney U Test-Ergebnis: W = 54 , p-value = 0.8649381
##
## Analyse für Spain :
## Nicht genug Daten für die Analyse.
##
## Analyse für Portugal :
## Nicht genug Daten für die Analyse.
##
## Analyse für Latvia :
## Mann-Whitney U Test-Ergebnis: W = 12 , p-value = 0.04383209

In fact, the difference in Germany is not significant. So, it turns out I wasn't as greedy as I thought after all 🙂

Now, what if I had only invested €50 in each loan instead of sometimes investing much more? How would I be doing today?
data$fantasy = 50
data %>%
mutate(`Loan Code` = str_remove(`Loan Code`, "-.*")) %>%
group_by(`Loan Code`) %>%
mutate(principal_complete = sum(`Initial Principal`), median_interest = median(`Interest Rate`)) %>%
select(`Loan Code`, Status, Country, median_interest, principal_complete, fantasy) %>%
arrange(`Loan Code`) %>%
unique() %>%
group_by(Status) %>%
summarize(fifty_only = sum(fantasy), real_numbers = sum(principal_complete)) %>%
mutate(percentages_fifty = 100 * (fifty_only / sum(fifty_only)), percentages_real = 100 * (real_numbers / sum(real_numbers))) %>%
select(-fifty_only, -real_numbers)

## # A tibble: 6 × 3
## Status percentages_fifty percentages_real
## <chr> <dbl> <dbl>
## 1 Fully Recovered 2.33 1.25
## 2 Funded 34.6 30.7
## 3 In Default 16.3 27.1
## 4 Late 1.66 3.91
## 5 Partially Recovered 0.332 0.899
## 6 Repaid 44.9 36.1
It's very clear here that my strategy of spending more on certain projects didn't work out well. It would have been better to invest more evenly and diversify my investments. That's exactly what I'm doing differently now.

Leave a Reply

Your email address will not be published. Required fields are marked *