AI-powered products are more prevalent than ever before: automatically organizing your photos, driving a car without any human input, and helping you craft a perfect email. These three examples are all using AI but the product experience and constraints are vastly different. As more and more products use AI in some way, it’s important to take these considerations into account.
Take the automatic organization of photos as an example. Here the accuracy doesn’t need to be perfect e.g. if you are searching for “dogs.” It’s ok to return more photos of dog-like-pictures as it is easy to scroll and find the exact picture you want - this is called a “high recall” approach.
Using these examples and others, we have identified three core essential product design considerations to take into account when developing an AI product:
What is precision vs recall? To illustrate this let’s use a common example: image classification. Say you have a world-class cutting-edge state-of-the-art AI model that classifies images as pizzas or not pizzas. With this model, you have a set of test data to help you evaluate how good your model is. You show it 50 images that are pizzas and 50 that are not pizzas. You then run your model on these images and look at the results.
Recall asks: how many images did the model classify as a pizza in total? Say your model classified 40 as pizzas, recall would be 40/50 = 80%.
Precision takes all of the images you classified as a pizza and asks: How many of them were actually a pizza? If only 30 out of those images classified as pizzas were actually pizzas, precision would be 30/40 = 75%.
In real-life scenarios, precision typically means getting the answer right on the first try. High precision means serving only relevant answers at critical points. High recall means not missing out on correct answers. The trade-off needs will shape the type of results and priorities of your model.
High recall can be used in places where the user can pick from a select few sample sizes. The cost of missing an answer is greater than the cost of getting it correct on one try. For Google photo search, it is more important to serve multiple possible results, as long as the final answer is there. The user can narrow down the results because they know what they are looking for. User success in this scenario is finding the right photo album.
Tesla’s self-driving cars drive on the other hand, are examples of high precision design, and need to avoid crashes without access to a relatively large sample of crash scenario data as compared to normal driving. Cars are heavy and carry a lot of momentum. That means the onboard systems for avoiding accidents can not change direction in the middle of accident avoidance. They need to be absolutely sure that the decision is correct in the event of an accident, and take action immediately.
Spoiler alert: AI systems aren’t perfect. If your product doesn’t account for this, as someone once said: “you’re gonna have a bad time”. It’s important to consider what is the alternative if your AI predictions don’t work.
Let’s look at two examples: Gmail’s Smart Compose and an Amazon Echo.
Smart Compose is a feature in Gmail that helps complete what you are typing. If there is a bad prediction, say you are saying “Good morning” and after “Good” smart compose comes back and shows “evening”, it’s no big deal, you can just keep typing “morning” and it doesn’t hurt the experience.
For an Amazon Echo, on the other hand, the fallback is more difficult. It has to understand what you are saying. If you speak to it and say “Play the latest Drake album” and Alexa is unable to understand you, there isn’t an alternative. The voice fallback is much harder to deliver and as such, very frustrating when it goes wrong. In similar systems such as automated phone systems, there is typically a fallback where a user can enter information over their phone or if they make enough noise, get connected to an actual human.
There’s a common saying that AI is “like a black box” - a model returns a prediction but does a user understand why?
A good user experience with control is Netflix recommendations. Netflix helps users understand why they are getting results by presenting users with a list of previously watched titles and providing an intuitive feedback loop with ratings that will improve your product experience the more you engage with it.
On the other hand, rideshare services like Uber and Lyft currently do not have a way to provide real feedback on their rides contributing to painful rider experiences -- many people know the feeling of getting pulled around the block for 5 minutes, just to pass by their house again while picking up another passenger, or getting stuck in traffic on the wrong side of town.
One potential method of providing control would be to section off areas of town, or streets the rider wants to avoid on their daily commute. Another option is to be more transparent with the ability to stay on the route that users preview when they initially book their ride. If a rideshare app highlighted map areas that the route could potentially go through, customers would be able to make more informed decisions, rather than sitting through poor experiences and asking for refunds.
In summary, as more and more products utilize AI, it’s important to keep three key user considerations in mind:
Not properly considering these can lead to frustrating user experiences and in some cases (such as self-driving cars), the stakes could be really high.
At Cresta we are using these principles to inform a great user experience. If you are interested in defining the cutting edge AI + HCI paradigms, we are hiring: https://cresta.ai/careers
Thanks to Jessica Zhao for reading & reviewing, Amy Lee for readability and design guidance, and Alex Roe for editing & distilling many drafts.