A Firsthand History of Deep Learning

Imagine you had Geoffrey Hinton, Yoshua Bengio and Yann LeCun in the same room talking about deep learning.

“In the 90’s other machine learning methods, that were easier for a novice to apply, did as well or better than neural nets on many problems. Interest in them died.

The three of us all knew they were ultimately going to be the answer. When we got better hardware and more data and a slight improvement in the techniques, they suddenly took off again.”

— Geoffrey Hinton

The interview starts 11 minutes in but the rest of the episode (and the Talking Machines podcast in general) has great content and production value.

“We had small data sets in computer vision that only have a few thousand training samples. If you train a convolutional net of the type that we had in the late 80’s and early 90’s, the performance would be very much lower than what you would get with classical vision systems. Mostly because those networks with many parameters are very hard to train. They learn the training set perfectly but they over-fit on the test set.

We devised a bunch of architectural components like rectification, contrast normalization and unsupervised pre-training that seemed to improve the performance significantly, which allowed those very heavy learning-based systems to match the performance or at least come close to the performance of classical systems. But it turns out all of this is rendered moot if you have lots of data and you use very large networks running on very fast computers.”

— Yann LeCun

“In the late 90’s and early 2000’s it was very, very difficult to do research in neural nets. In my own lab, I had to twist my students’ arms to do work on neural nets. They were afraid of seeing their papers rejected because they were working on the subject. Actually it did happen quite a bit for all the wrong reasons like, ‘Oh. We don’t do neural nets anymore.’

… I tried to even show mathematically why [the alternatives] wouldn’t work for the kinds of ambitious problems we wanted to solve for AI. That was how I started contributing towards the new wave of deep learning that CIFAR has promoted.”

— Yoshua Bengio

Correction: The original version of this post misspelled Yoshua Bengio’s name.