Generated Code
When you take a small Core ML model like Squeeze Net and drop it into Xcode, you get a SqueezeNet
implementation. Without comments, the class itself is a only a few dozen lines in length.
class SqueezeNet {
var model: MLModel
init(contentsOf url: URL) throws {
self.model = try MLModel(contentsOf: url)
}
convenience init() {
let bundle = Bundle(for: SqueezeNet.self)
let assetPath = bundle.url(forResource: "SqueezeNet", withExtension:"mlmodelc")
try! self.init(contentsOf: assetPath!)
}
func prediction(input: SqueezeNetInput) throws -> SqueezeNetOutput {
let outFeatures = try model.prediction(from: input)
let result = SqueezeNetOutput(classLabelProbs: outFeatures.featureValue(for: "classLabelProbs")!.dictionaryValue as! [String : Double], classLabel: outFeatures.featureValue(for: "classLabel")!.stringValue)
return result
}
func prediction(image: CVPixelBuffer) throws -> SqueezeNetOutput {
let input_ = SqueezeNetInput(image: image)
return try self.prediction(input: input_)
}
}
SqueezeNet
is not a subclass of MLModel
. It’s a wrapper with methods for loading and driving a model
property. Xcode generates input and output classes for the model as well.
Feature Introspection
In the end, a class like SqueezeNet
is a high-level, type-safe convenience. At a lower level, it’s possible to drive the model directly and generically. This is how a framework like Vision
is able to inspect and use an arbitrary model at runtime.
MLModel
sits at the heart of Core ML. It’s an abstraction that’s focused on input and output features. MLModelDescription
indicates how these features are structured. An MLFeatureProvider
delivers input into the model. A corresponding MLFeatureProvider
contains output.
Image Processing and Classification
Using the Vision
framework, a VNImageBasedRequest
turns an image into observations.
VNCoreMLRequest
is a type of image-based request that uses a Core ML model. Depending on the output description of the model, the request will automatically produce either classifications, pixel buffers or feature values.
As Matthijs Hollemans mentions in his post, it’s important to match image input to the preprocessing that a model expects. Much of this can be addressed when the model is converted to Core ML. From the modelDescription
, the Vision
framework is able to convert most images into the pixel format that a model wants.
Another kind of compatibility to be aware of is image orientation. This is particularly true for iOS where captured images typically align with the camera sensor instead of aligning to the orientation of the device. Request handlers in the Vision
framework offer a way to specify orientation when applying Core ML requests to camera images.
Conclusion
“A successful book is not made of what is in it, but of what is left out of it.”
— Mark Twain
When using an MLModel
, an app is free to focus on input and output. Model implementation is set aside as optimization detail to be navigated by the .mlmodel
creator who knows the model and Apple engineers who know their lineup of hardware.
Core ML can generate code for a model that is convenient and easy to use. However if your solution involves plugging in a range potential models, any MLModel
can be inspected at runtime to query how its input and output features are structured.