Robotics N°1

Why are robotics different from other AI systems? 

Part 1 : What do you care about?.

In this series of articles, we’ll speak about the particularities that make artificial intelligence robotics really different from other AI domains.

As a disclaimer : We’ll speak a lot about self-driving cars, but robotics go beyond this use-case. Industries spent billions of dollars trying to solve this specific AI robotic challenge, and it is full of lessons when it comes to thinking about the future of robotics.


In this first article, we’ll speak about what one cares about when designing an AI system. 

It’s obviously extremely different to design an auto-pilot for airplanes or an AI Player for a video game. Outcomes are very different and it’s obvious that they involve different development processes. In one case you want to develop a zero-failure system, in the other you are more focused on balance, creativity, etc.

In robotics, whenever you try to detect people, classify images or track objects, you optimize some metrics or KPI to measure how far (or near) you are from your goal. Those metrics allow you to objectively tell how well your algorithm performs, how many times it succeeds or fails. We almost always use the same ones, like the accuracy or the mean absolute error, depending on the case. 

But behind this consensus, those metrics tell a lot about what we’re trying to do and we’ll see how it impacts applicative use-cases, especially in robotics.

Best, average or worst case optimization?

The AI industry is very close to research. Research papers quickly flow to industrial use-cases, sometimes revolutionizing previous processes. However, research goals can be very different from industrial ones, particularly those where metrics are used.

Research papers seem to publish almost neutral metrics, such as average accuracy and average precision. As they are not driven by any applicative goal, they treat all samples (examples) with the same importance. But they often show some “qualitative” results to be convincing, for instance where objects are well detected or images are well classified. They also try to present their best results by bringing their most performing metrics to the front. This leads to a case we’ll name “best case optimization” or, at least, “average case optimization”. Here we speak about human optimization, the fact that researchers design and optimize their research to perform well for these specific cases.

GAFAs applications almost fall in the same category. Even if they have an applicative use case, they optimize the “average case” as none of the events are critical for them. For example, consider an automatic moderation system, like the one used in Youtube. Their goal is to mitigate the impact of off-the-rule videos on their platform. Reducing the number of videos violating community standards by 50% is good news. That’s an average metric. There are plenty of examples like this : face detection in Facebook photos, automatic subtitles in videos or street numbers in Google Maps. They all belong to the “average case optimization” case.

On the contrary, robotics applications, as all other critical industries, need to optimize for the worst case. To be convinced, we can take a look at the self-driving cars industry. In fact, self-driving cars work very well on average, almost better than human drivers.  But if you look at the news about self-driving accidents, the Uber one for example, you’ll understand the problem this industry is facing. An accident is the worst case self-driving can encounter and most people look only at the worst effect to tell if self-driving is fine or not. Even if Uber self-driving cars safely drove millions of miles, only one accident is sufficient to stop their experiments. The entire self-driving car industry is led by worst-case scenarios minimization.

As another illustration of worst case optimization, we can speak about SpaceX, specialist of falling. Even if 99% of their systems work, you need only one faulting sub-system to make giant fireballs.

When you do robotics, you care a lot more about worst cases than average or best cases. If you put a hundred-kilogram robot in the middle of people, you are far more concerned about the possibility of the robot charging at the crowd than this “on average” results of the robot avoiding people. 

This particularity is not reserved to robotics, but applies to all so-called critical systems. 

For robotic companies there is a particularity, because as we describe earlier, machine learning algorithms are developed in a “best-case optimization” mindset. Using them in “worth case optimization” applications can be problematic and a large waste of time and resources.

The long tail or the malediction of rare-events

In statistics and business, a long tail of some distributions of numbers is the portion of the distribution having many occurrences or rare-events (in yellow) far from the “head” or central part (in green) of the distribution.

An issue of this worst case optimization in robotic and machine learning appeared with rare-event. Machine learning systems are trained on large datasets, and evaluated on many samples. That’s an “average case optimization” case where the model is in fact almost focused and evaluated on the most probable events. In the case your metric is related to the worst case, this approach gives poor results. (There are some methods to mitigate this issue, like dataset balancing or hard sample mining, but they often are not sufficient).

As you know, machine learning, and especially deep learning, needs a lot of examples to be able to perform well, often hundreds per case. Obviously, self-driving car companies acquire data by driving cars. Imagine a rare-event that occurred only one every million miles. Even if you exploit mitigation methods, you’ll need hundreds of examples of this event, that represent hundreds of millions of miles. And even with that, you can’t be sure that this event will be managed properly.

You can think that is not a problem, because rare-events are rare, by definition. So, we can manage them by other methods, like using specific techniques or hand-craft (as opposed to learned) codes to detect them. But saying an event is rare only says that it occurs rarely, it doesn’t inform about how many rare-events there are. If you have one million of different rare-events occurring only once by millions of miles, you get a rare-event per mile, even if they are “rare”. You’ll need millions of specific solutions for each of them. That’s what we call the “malediction of rare-events”.

The long tail issue is well-known by self-driving cars manufacturers. Development costs and times grow exponentially with the performance of the AI system. Because managing any type of event costs the same if this event occurs often or rarely, the last percent of performance (or safety) of the system is very costly. They are composed of an increased number of rare-events needing each-one time and money to be solved. Moreover, some rare-events need creativity to be solved, and this characteristic leads to hand-crafted solutions made by engineers. In the end, hand-craft solutions replace machine learning as you go into the long tail.

Robustness VS Accuracy

More than little tricks, worst-case optimization requires a deep rethink of AI-systems. When a system solves the average or best case problem, it uses every piece of information available to enhance its performance. When a system is designed for the worst case problem, it usually uses the available information redundantly to minimize the probability of failing. That’s a major design difference leading to very different solutions. 

AI algorithms are designed (by researchers looking for best or average case optimization) to use their information to produce best decisions. That’s a major reason why AI vision systems can be fooled by stickers or more recently by handwritten notes. On the opposite, human vision uses redundancy to make a compromise between best and robust decisions.

For us, a stop sign can hardly be fooled by simple stickers because we construct a global and redundant context of the object. Our comprehension of a stop sign isn’t limited to a red octagonal shape in an image but is composed of many insights : it also has a specific shape isolated from the background in terms of color, but also in depth. It’s often about 2 meters beside the road. It’s often used in conjunction with some painted marks on the ground. It’s mostly static in respect to the floor (no specific movement). Any deviation from those priors leads us to be careful, instead of just missing the sign. 

Speaking about stop-signs, one nightmare for self-driving car makers was the rare-event of workers holding stop-signs, a good not-so-rare example of a long-tail related problem.

How we manage this at Visual Behavior

At Visual Behavior, we paid particular attention in designing robust systems. It can be visible at different levels. The main way we mitigate the long tail and rare event problem is at the core of Visual Behavior. We tend to solve the general problem of robotics. Our technology is designed to be for general purpose, with common sense, and adaptable. As our technology isn’t objective-dependent, the notion of “rare-event” is less present. A basketball bouncing into the road is a rare event for a self-driving car. But for an algorithm that learned to analyse sport games, the ball has rather predictable movements. As an analogy, humans learn to drive a car in dozens of hours but in fact, they use years of experience they have in this world and transfer their knowledge to this new specific skill.

Another way we mitigate this problem is by designing persistent multi-modal/multi-task end-to-end algorithms. Our systems don’t make independant predictions about the world but it rather constructs a complete scene comprehension, that is spatially and temporally stable, and make predictions thanks to this world representation.

Newsletter

Conclusion

Our first journey towards the specificity of robotic AI systems led us to the difference between best/average case optimization and worst case optimization. We saw that critical-systems need specific solutions to minimize failures and worst cases. We depicted the issue of long-tailing systems full of rare-events and the difficulty (and the cost!) of managing them. Finally, we caught a glimpse of how humans manage this problem and how robotic systems can be inspired by this.

We’ve seen that robotic systems are full of compromise, and not just led by accuracy. Next time, we’ll speak about another big compromise of robotic AI systems : the time constraint.