Back in 2015 interest in Torch was peaking. Around that time, I took a look under the hood to try and map out how the framework was put together.
“Around pre-2014, there were three main frameworks. … They all had their nitch.
Theano was really good as a symbolic compiler. Torch was a framework that would try to be out of your way if you’re a C programmer. You could write your C programs and then just interface it into Lua’s interpreter language. Caffe was very suited to computer vision models. So if you wanted a conv net and you wanted to train it on a large vision dataset, Caffe was your framework.
All three of these frameworks had aging designs. These frameworks were about six or seven years old. It was evident that the field was moving their research in a certain direction and these frameworks and their abstractions weren’t keeping up.
In late 2015, TensorFlow came out. Tensorflow was one of the first professionally built frameworks from the ground up to be open source. … I see Tensorflow as a much better Theano-style framework.”
… [Before that] Deep Mind was using Torch. Facebook. Twitter. Several university labs. The year of 2015 was Torch. The year of 2014 was Caffe. The year of 2016 was TensorFlow in terms of getting the large set of audiences.”
… Keras is a fantastic front end for TensorFlow and Theano and CNTK. You can build neural networks quickly. … It’s a very powerful tool for data scientists who want to remain in Python and never want to go into C or C++.”
Soumith was a significant contributor to Torch and started working on its successor in July 2016.
“PyTorch is both a front end and a back end. You can think of PyTorch as something that gives you the ease of use of Keras, or probably more in terms of debugging. And power users can go all the way down to the C level and do hand coded optimizations.
It takes the whole stack of a front end calling a back end to create a neural network. And that back end in turn calls some underlying GPU code or CPU code. And we make that whole stack very flat without many abstractions so that you have a superior user experience.”
Pete Warden offers advice for turning machine learning into a career.
“I took a very random path to focusing on deep learning full time, but so did most of the people I work with.”
“In 2009, Li and her team published the ImageNet paper with the dataset — to little fanfare. Li recalls that CVPR, a leading conference in computer vision research, only allowed a poster, instead of an oral presentation, and the team handed out ImageNet-branded pens to drum up interest. People were skeptical of the basic idea that more data would help them develop better algorithms.”
— Dave Gershgorn
Within three years, everything would change.
“If the artificial intelligence boom we see today could be attributed to a single event, it would be the announcement of the 2012 ImageNet challenge results.
Geoffrey Hinton, Ilya Sutskever, and Alex Krizhevsky from the University of Toronto submitted a deep convolutional neural network architecture called AlexNet — still used in research to this day — which beat the field by a whopping 10.8 percentage point margin.”
A curated list of over 250 links across 20+ categories from Alex Sosnovshchenko.
“Some of the resources are awesome, some are great, some are fun, and some can serve as an inspiration.”
The most recent update was this month. If you have a GitHub account you can watch the list here.
Just enough syntax to get you started in Python. Many details will look familiar to Swift programmers.
“In the 90’s other machine learning methods, that were easier for a novice to apply, did as well or better than neural nets on many problems. Interest in them died.
The three of us all knew they were ultimately going to be the answer. When we got better hardware and more data and a slight improvement in the techniques, they suddenly took off again.”
— Geoffrey Hinton
The interview starts 11 minutes in but the rest of the episode (and the Talking Machines podcast in general) has great content and production value.
“We had small data sets in computer vision that only have a few thousand training samples. If you train a convolutional net of the type that we had in the late 80’s and early 90’s, the performance would be very much lower than what you would get with classical vision systems. Mostly because those networks with many parameters are very hard to train. They learn the training set perfectly but they over-fit on the test set.
We devised a bunch of architectural components like rectification, contrast normalization and unsupervised pre-training that seemed to improve the performance significantly, which allowed those very heavy learning-based systems to match the performance or at least come close to the performance of classical systems. But it turns out all of this is rendered moot if you have lots of data and you use very large networks running on very fast computers.”
— Yann LeCun
“In the late 90’s and early 2000’s it was very, very difficult to do research in neural nets. In my own lab, I had to twist my students’ arms to do work on neural nets. They were afraid of seeing their papers rejected because they were working on the subject. Actually it did happen quite a bit for all the wrong reasons like, ‘Oh. We don’t do neural nets anymore.’
… I tried to even show mathematically why [the alternatives] wouldn’t work for the kinds of ambitious problems we wanted to solve for AI. That was how I started contributing towards the new wave of deep learning that CIFAR has promoted.”
— Yoshua Bengio
Correction: The original version of this post misspelled Yoshua Bengio’s name.
The post starts with the basics. Building a linear model in Python with scikit-learn. Converting it to an
.mlmodel. Hooking it up to UIKit controls.
“Model building is difficult, and this isn’t the right post for a deep dive into model selection and performance.”
The model uses a couple features from a Boston data set to predict house prices. A simple problem space to wrap your head around.
“Core ML makes working with machine learning and statistical models more convenient and idiomatic.
Nonetheless, it is worth ending with a caveat. Statistics and machine learning are not simply APIs. They are entire fields of study that are equal parts art and science. Developing and selecting an informative model requires practice and study.”
I completely agree. The same can be said about good graphic design. Having an expert create it and then going through the process of integrating the design into your app are two different things.
What makes Core ML interesting is how little it asks of the developer who already has their
.mlmodel in hand. It’s an approach to machine learning that says, “We’ll bring this technology to you instead of making you come to us.”
Apple machine learning APIs aren’t just for third-party developers. Steven Levy wrote this article almost ten months before Apple announced Core ML. It offers a detailed look into how the company uses machine learning across its own products.
“As the briefing unfolds, it becomes clear how much AI has already shaped the overall experience of using the Apple ecosystem. The view from the AI establishment is that Apple is constrained by its lack of a search engine (which can deliver the data that helps to train neural networks) and its inflexible insistence on protecting user information (which potentially denies Apple data it otherwise might use). But it turns out that Apple has figured out how to jump both those hurdles.”
— Steven Levy
The integration of deep learning into Siri dates back to 2014.
“This was one of those things where the jump was so significant that you do the test again to make sure that somebody didn’t drop a decimal place.”
— Eddy Cue
One detail that keeps getting hinted at, both in the article and elsewhere, is online learning.
“We keep some of the most sensitive things where the ML is occurring entirely local to the device.”
— Craig Federighi
Details aren’t clear enough from this kind of statement to know for sure. Online learning implies an ability to train the machine learning model on the device. Something conspicuously absent from the first public version of Core ML.
This kind of limitation is understandable. As Gaurav Kapoor said in his introduction to the framework, “Training is a complex subject.”
With deep learning, jumping from inference to training is like jumping from the Preview app into Photoshop. The change in complexity and required expertise is significant. It wouldn’t be surprising if Apple waits and doesn’t add training to Core ML until they are able to flatten that learning curve. Making it more like a jump from Preview to Pages.
The best post on Core ML I’ve seen so far.
“[Metal Performance Shader] Graph API. This is the big news as far as I’m concerned. Creating all the layers and (temporary) images by hand was always a nuisance. Now you can describe a graph, just like you would in Keras. MPS will automatically figure out how large the images need to be, how to deal with padding, how to set the offset of your MPS kernels, and so on. It can even optimize the graph behind the scenes by fusing layers.”
— Matthijs Hollemans
“The new graph API makes my Forge library pretty much obsolete, unless you want to keep supporting iOS 10 in your apps.”
If your deployment target is staying on iOS 10 for a while, the Forge library may be your best bet until you’re able to migrate to the machine learning features in iOS 11.