May 22, 2019

Software that learns by demonstration

Think about a time when you taught someone something difficult. Did you just tell them what to do, or did you also show them how it’s done?

Showing how conveys a wealth of unspoken information. By observing a skill in practice, we can see which actions and behaviors to emulate, and which to avoid. Observation also lets us identify situational nuances that we can’t define, or wouldn’t otherwise be aware of.

The same concept applies to programming intelligent software, and is the foundation of Cresta’s expertise management technology.

Cresta observes actions on screen, and facilitates expertise sharing across the team.

Software 2.0

Traditional software engineering focuses on telling a program what it should do, e.g. encoding rules and functional parameters to determine output.

But the key limitation lies in the distinction between knowledge and expertise.

Expertise is situational. It is the sophisticated understanding of how to best approach a given scenario. It takes years for a person to accumulate expertise, which often remains locked inside a person’s brain. Expertise is hard to explain, and much harder to code.

"Stochastic Gradient Descent can write better code than you." – Andrej Karpathy.

Cresta is the “system of record” for expertise. Our software learns expert behaviors and practices directly from human examples to uniformly distribute expertise to the rest of the team. This approach helps us unlock the full potential of human knowledge, allowing neural networks to give value to data points we may have not considered previously. Andrej Karpathy describes this as Software 2.0. As Andrej put it, “Stochastic Gradient Descent can write better code than you.”

For the enterprise, Cresta’s impact is immediate.

How it works

To better understand how learning by demonstration empowers teams, let’s take a look at another practical implementation: AlphaGo.

A workflow is like a game; your next move could take you anywhere, leading to many different potential outcomes. (Image courtesy of AlphaGo)

AlphaGo was first program to defeat a world Go champion. The game of Go requires years of experience humans to master, but AlphaGo was able to beat world-class players within months of its conception.

There are two key architectural components that help the program make its decisions. The first is the policy: given the situation, what move can I make next? The second is the reward: given these options, what value, or reward, is associated with each of them?

To know what move to make next, a Go player needs to stay calibrated within the changing contexts of the game. In each configuration of the board, there are different scores, or rewards, that are associated with different potential moves.

Just like a board game, an individual’s workflow can go many different ways. At Stanford, we worked on reinforcement learning to “clone” workflows from demonstration. We built World of Bits, a platform in which crowdworkers use basic keyboard and mouse actions to perform tasks defined by NLP questions. Leveraging human data, we trained a neural network policy to optimize for the reward of a given task.

World of Bits: Software learns from crowdworkers to book flights on web using keyboard and mouse

At Cresta, we apply these principles to the workforce. Consider the workflow of a call center agent engaging with a customer via live chat. A conversation has various potential endpoints, but the agent must know which steps to take to optimize reward, relying on context to evaluate the behaviors, expressions, questions, or offerings that would help close a deal.

By helping agents expertly navigate customer interactions, Cresta has been able to close the 3x performance gap between the lowest and highest performing agents.

Cresta’s AI identifies high-leverage actions and weighs the rewards of each based on performance data gathered from expert sales agents. Actions with the highest value are then converted into suggestions that other sales agent can execute in similar contexts. This is how Cresta emulates and exceed human decision-making, enabling average agents to become experts, and expert agents to become even better.

We have already seen these results with our customers. Large enterprise sales teams with hundreds of agents currently use Cresta to capture expert behavior and dramatically increase performance, efficiency, and ramp-up speed for every agent on the team.

This is the foundation of our work at Cresta -- and sales is just the beginning. Across every company, there exists expertise locked away in many different domains. We are moving towards a world where observation trumps codification. With enough examples of how experts have accomplished desired outcomes, we leave it to our neural network to decide the best way of achieving them. This is what makes Cresta unique as we work to deliver partially automated, highly intelligent work experiences in any domain.

Thanks to Andrej Karpathy, Alex Roe, Navjot Matharu and Sophia Arakelyan for reviewing drafts of this.