Summary of a talk at the Code University Berlin / January 2019
[Exercises are not included in this summary]
Creating a persona has become a popular exercise in the product development world, but it has also become one of the most misunderstood. Alan Cooper invented the concept of a persona while working on a software, and after having interviewed a dozen people, he realized that his software would be a used by different groups of users. For each group, he created a persona, he literally condensed the needs and other traits of a group’s members into one imagined person. In other words, a persona is not a real person, but it is also not completely fake as it is based on real people. Personas are supposed to give us a perspective on the people who will use our product. We want to keep their needs and experience in mind when we design the product.
Unfortunately, often enough, personas are made up without having talked to a person or by having researched data. Instead, someone sits in a room and just creates a persona because that someone believes he knows what type of person would use the product. Obviously, it is bad to build products based on people that are just in your head and solely rely on your perspective on the world. Creating valid personas requires time and brain efforts.
In the approach presented here, a combination of qualitative and quantitative data is discussed. The combination has proven to be extremely useful in product development as well as advertising, mainly because it leverages the pros of each way of collecting data while reducing the cons. Furthermore, we will look into data that you possess or collect and external data that you can use. The following graphic displays the iterative process of creating data-driven personas:
Listen to your user. But do it differently.
A typical approach in product development is to ask real users what they think about a product. That’s a great idea. But it requires more than just putting a survey onto a web page or asking people on the street. Things to consider:
- We want to avoid convenience sampling: Of course, we can just ask people around us; however, how do we know that the people around us are not by accident completely different from the majority of people? As an example, most of my students at HAW Hamburg use a Macbook. I guess even most students at Code University use a Mac. I could easily come to the conclusion that students mainly use Macs. But is it really true? (It is not). If I want to create a cloud-based note taking software for students, I would want to know what kind of computers students really use (for the persona but also for my market research)
- Another example here is the problem of non-respondents: If you use a survey on your website, it is unlikely that all users have an equal interest in taking part. Some users have more time than others, and some people just don’t like surveys. However, maybe their opinion is extremely important to your question.
- Also, respondants may have a yes bias, a bias towards a middle or a no bias. The latter can be solved by adjusting questions, non-respondents are, however, an issue that often occurs.
- Asking the right questions is an art in itself. “Do you use an ad blocker?” is not a great question because users don’t want to look stupid (social desirability). It is not unlikely they will say they use one even if they have never installed one. Asking what ad blocker product a user is currently using is a much better question because it requires that the user has actually installed one or at least spent some time looking for one. Although it looks easy to put a survey onto a website, you should seek help from someone who has some experience or a background in psychology or social sciences.
- In interviews, we want to avoid to rely only on anecdotal evidence: “The girlfriend of my brother is a [put profession here], and she always does [put some behavior here]”. These remarks can be valuable but should always be taken with a grain of salt until more data is available.
Having said all that, it is not necessary to have a sample that represents the population in qualitative research. We want to know what people are interested in while making sure we don’t talk to obscure cases only. And, asking only a few people what they think is of interest for other users of a group or industry may already widen the perspective. Qualitative research will inform us what kind of questions we should look into in quantitative research. In some cases, however, quantitative data will not confirm qualitative data.
A good approach is to have at least some data available that shows a distribution of users. As an example, if you want to create a software product for students, you may want to make sure you are aware what kind of subjects can be studied, what are the most popular ones, and to talk to students of the most popular subjects. We will look into where to get such data in the next section.
Where to get (free) data
Here are a few data sources:
- Best 4 Planning (German only)
- Facebook Audience Insights
- YouGov: Limited free resources but nevertheless worth looking into (British site looks a bit different)
- Amazon: Users who bought this also bought… but you can also look at ratings
- Statista: Many studies; however, make sure you understand the source which often is not Statista itself. In some cases, you will also find the study somewhere else for free.
Let’s look at one example, again we will do some research about students and Macbooks in Germany:
This piece of data tells us interest is lower for younger people, at least for women and higher for people above 44 years. However, data also shows that the Facebook population does not represent the German population. Also, having an interest does not mean that people actually buy the computer. Finally, we could not add students as profession as a filter so we just have the whole population of people interested in the Macbook family.
Best4Planning also does not have real figures about ownership but at least propensity to buy:
In this case, we have only looked at students, and Apple is by far the most popular brand. But it does not represent the majority of users. What is interesting here is that the population of 2,54 million students in this study is close to the 2.8 million students in Germany in 2018. Again, propensity to buy does not mean that the brand will be bought. However, we can use it as a signal. Let’s dive further into this data source:
We add gender data and see that women slightly prefer Apple whereas men prefer PC brands (interesting to see Compaq here because they have been out of business for a long time already). By adding more data, we will also be able to see that Apple users are more into Fitness Apps while IBM users are more likely to be news addicts. We can even get data about cloud usage and paying for cloud services: Men are more likely to pay for a cloud service while Apple users are less likely to pay; however, they can imagine to do so.
For our software development project for students, we would definitely have to talk to both Windows and Mac users, and we also have some additional information that helps us to create the archetype. In interviews, we would have to ask how users would use such a tool, what their use cases are, and how they would learn about the product.
Don’t believe the tools (unless you understand them)
Before we dive into further data exploration, there is one thing to be aware of. A huge variety of (free) tools exists as we have seen, but if you don’t understand how tools work and how they collect data, you are just a fool with a tool (is still a fool). Some tools look really easy to use like Google Trends. However, sometimes, these tools include much more complexity. Google Trends graphs, for example, do not reflect absolute numbers and increasing interest in Google Trends does not automatically mean an increasing number of search queries. Even Google employees may not be aware how the suboptimal presentation of data may lead to wrong interpretations as some of their tweets show:
A 250% spike is great but from which base? This tweet resulted in the following headline:
Nothing in the data said that “many” people don’t know. It was just a spike. The Google Ads Keyword Planner would have shown absolute numbers. In a similar fashion, if we look at Macbooks versus Acer laptops, Acer seems to be not existing. However, we could not limit to students searches only.
Similar misinterpretations happen with all types of tools, for example Similar Web. You must make sure you understand how a tool works before you use the tool’s data.
Another great way to acquire data is to examine usage data from existing products or websites. As an example, Code University could look into the usage of their own Confluence or other assets and build clusters based on usage. One example here is the use of association rules, pretending that visited content or used features are like elements in a market basket:
In this case (my own page), we see at least two different groups of people; some seem to be students, others seem to be interested in content about robo advisors. We could even add computer OS data here.
Google Analytics also offers demographics and interest data. Unfortunately, this data is not attached to users but to pages that are visited by users with demographic and interest data. As a consequence, it requires segmenting into pages and looking at this data for each page to see differences.
There is one huge drawback in this approach: It only tells you about users you already have. Not the ones that you don’t have today because they did not know about your site.
Final Warning: Don’t believe yourself
A last challenge is the so called confirmation bias. If you already have an assumption, you may prefer data that supports your opinion, no matter whether you are aware of it or not. It may be advantageous to have another person (that doesn’t care at all about your results) review your data.