Deep Dojo

The Data Show Podcast Discusses PyTorch

Ben Lorica interviews Soumith Chintala from Facebook who provides some background on frameworks used for training deep learning models.

“Around pre-2014, there were three main frameworks. … They all had their nitch.

Theano was really good as a symbolic compiler. Torch was a framework that would try to be out of your way if you’re a C programmer. You could write your C programs and then just interface it into Lua’s interpreter language. Caffe was very suited to computer vision models. So if you wanted a conv net and you wanted to train it on a large vision dataset, Caffe was your framework.

All three of these frameworks had aging designs. These frameworks were about six or seven years old. It was evident that the field was moving their research in a certain direction and these frameworks and their abstractions weren’t keeping up.

In late 2015, TensorFlow came out. Tensorflow was one of the first professionally built frameworks from the ground up to be open source. … I see Tensorflow as a much better Theano-style framework.”

… [Before that] Deep Mind was using Torch. Facebook. Twitter. Several university labs. The year of 2015 was Torch. The year of 2014 was Caffe. The year of 2016 was TensorFlow in terms of getting the large set of audiences.”

… Keras is a fantastic front end for TensorFlow and Theano and CNTK. You can build neural networks quickly. … It’s a very powerful tool for data scientists who want to remain in Python and never want to go into C or C++.”

Soumith was a significant contributor to Torch and started working on its successor in July 2016.

PyTorch is both a front end and a back end. You can think of PyTorch as something that gives you the ease of use of Keras, or probably more in terms of debugging. And power users can go all the way down to the C level and do hand coded optimizations.

It takes the whole stack of a front end calling a back end to create a neural network. And that back end in turn calls some underlying GPU code or CPU code. And we make that whole stack very flat without many abstractions so that you have a superior user experience.”