Developer Blog: Using Windows ML, ONNX, and NVIDIA Tensor Cores

As more and more deep learning models are being deployed into production environments, there is a growing need for a separation between the work on the model itself, and the work of integrating it into a production pipeline. Windows ML caters to this demand by addressing efficient deployment of pretrained deep learning models into Windows applications.

Developing and training the model itself requires being involved with the science as well as know-hows behind it. However, when a pretrained model is being used in a pipeline for inference, it can be treated as simply a series of arbitrary computations on incoming data. These computations are fully described by an ONNX file representing the deep learning model. The ONNX model can be edited and processed to make some simple but often-needed tweaks and optimizations at the deployment stage.

Windows ML Overview

Windows Machine Learning (Windows ML) allows you to write applications in C#, C++, JavaScript, or Python, and which operate on trained ONNX neural nets. This is an ideal framework if you want to perform inference using previously trained neural nets in your application pipeline without worrying about the internals and complexities of the neural net itself.

ONNX overview

Introduced by Facebook and Microsoft, ONNX is an open interchange format for ML models that allows you to more easily move between frameworks such as PyTorch, TensorFlow, and Caffe2. An actively evolving ecosystem is built around ONNX.

By combining a straightforward, robust, and efficient machine learning inferencing framework, as well as a comprehensive and richly supported neural net model data format like ONNX, Windows ML allows you to integrate state-of-the-art AI models developed by research scientists, directly into real-world applications.

Read the full blog, Using Windows ML, ONNX, and NVIDIA Tensor Cores, on the NVIDIA Developer Blog.