This video is about 27 minutes long, but is fascinating throughout. It details the recent history, present, and near future of the Mobileye EyeQ computer chip and computer vision system. Of course this is just one chip, presented by the company that sells it, but performance has been and is being demonstrated in real world scenarios. We can be fairly confident in the honesty of the claims.
Self-driving cars made the leap from science fiction to modern fact after Stanford University's "Stanley" won the DARPA Grand Challenge, the DARPA project to encourage research into autonomous vehicles.
Although the Grand Challenge was nearly eight years ago now, what stuck with me is that there were two approaches taken by the teams that competed in it. One approach was what I called "plan and prepare". They studied the course for the challenge ahead of time, planned out a route, and programmed their cars to follow that route. The other approach I called "see and react". They didn't approach the course with a plan, but rather with a GPS guidance system and a bunch of rules like "drive towards your destination, but not into trees".
As it happened, the Stanley car that won the race used the "see and react" system, but the "plan and prepare" cars did manage to finish. However it seemed obvious to me from that moment that only "see and react" systems, or "observant, rule-based systems", could scale and adapt to all of the driving situations the world had to offer. Even the best map in the world would fail the "plan and prepare" car at the first accident, fallen tree, or change in traffic patterns. And that's ignoring the issues of how you plan and prepare for private roads, or informal roads in Ecuador or Indonesia, where your mapping cars never drive.
Sidebar: I don't mean to criticize the teams that developed their cars using the plan and prepare method. It was still excellent engineering. And after all, they weren't designing their cars to drive around the world, they were only focused on winning the Grand Challenge. Their methods worked, and almost won. And they were facing the real constraint of the state of art in computer chips at that time. Visual processing is hard, and running a real-time computer vision system within the power budget of a standard passenger car was impossible. Stanley, the winning "see and react" car, had a server rack in its trunk paid a heavy weight penalty to include the batteries needed to run it. It made total sense from the perspective of winning the Challenge to move the heavy computing to a server that wasn't on the car, and keep the car as dumb (and lean!) as possible. But in the long term it should have been obvious that this power and processing constraint was a temporary one, as Moore's Law eventually makes all such constraints trivial for a fixed amount of processing work.
Now, since the Stanford team that won the Grand Challenge used the "see and react" method, and that team was headed by Sebastian Thrun, and Sebastian Thrun went to work for Google as part of their self-driving car effort, I assumed that Google would continue using this method. But that was my mistake, as it appears they have not.
As this article describes (it gets other details wrong, but is correct so far as Google's efforts are concerned) Google has built an exceptionally detailed version of Google Maps for the areas in Northern California near their headquarters, and their self-driving cars use these maps for driving. Essentially they use very expensive LIDAR sensors to map the 3D space the car is driving in, compares it to the map in Google's massive cloud computer, and then cautiously makes it way through the world based on that. And to be clear, that map is accurate to the fraction of an inch.
What we have here is the world's biggest "plan and prepare" system. Google is sensing the roads ahead of time, using its server farms to analyze that data, and then telling the car what to do. The car itself is fairly stupid, its sensors are expensive, it reacts poorly to changes in road conditions, and it cannot drive any further than Google's Map. It cannot drive to New York, or probably even much further than Las Vegas. Certainly not to Guadalajara.
The EyeQ chip, on the other hand, uses regular camera sensors (like you find in your cell phone) and extracts all the information it needs from that visual data, then in real time determines where it should be driving and what it should avoid. This system is working today, and Tesla will deploy it to regular customers this summer.
So to sum up, Google's system cannot scale geographically further than its Map, it cannot adapt well to changes in road conditions (like bad lighting or snow that covers lane markers), and it uses really expensive LIDAR sensors. Meanwhile the Mobileye system (which Tesla is deploying already) can drive on any road, reacts well to changes, and relies on cheap (and getting cheaper all the time) camera sensors. And Mobileye's system will only get cheaper and better as its chips ride Moore's Law and the camera sensors are driven by the economies of scales found in the smartphone market.
Which leads me back to my original conclusion: As surprising as it seems, Google is not the leader in self-driving car technology, and it's hard to see how they become so without a massive effort starting almost entirely from scratch.
No comments:
Post a Comment