MXNet Extensions

Since I joined Amazon and started working on the Apache (incubating) MXNet open source deep learning framework I kept looking for ways to integrate hardware acceleration. The main problem is that MXNet was never designed to support accelerators. It didnt even have C++ custom operator interface. If you wanted to add a new high performance operator, or leverage a new accelerator or library you would need to modify the framework, recompile from source, and distribute a new build.

So I set out to make MXNet more extensible, first by building a custom C++ operator interface that enables dynamically loaded libraries of operators at runtime. The first pull request added the main components for a C ABI between MXNet and the external library. This let users compile new operators with any compiler on the same platform (ie. Windows, Linux, etc), further reducing the complexity of adding custom operators. It also included support for the DLPack common in-memory tensor structure used in all the major deep learning frameworks (TF, PT, and MX) which would let operator writers support all frameworks with a similar tensor data structure.

Then I moved on to enabling whole groups of operators to be executed outside of the framework by creating a dynamically loaded model partitioning interface. This pull request added support for partitioning operators in the model into subgraphs. In MXNet a subgraph is just another operator (that happens to execute the workload of more than one operator). The custom operator work was leveraged to allow implementing the execution of the subgraph from the same external library as well as the logic that partitioned the model.

So whats the takeaway? MXNet is now the first framework to seamlessly enable model partitioning from an external library that does not require recompiling the framework (or including all headers from the framework in the external library). It has a C ABI compatibility layer, removing the complexity of building with specific compiler versions. Nightly builds since January 9th include these features, and the next MXNet release (1.7 or 2.0) will incorporate them into formal release.

But we’re not stopping there, in collaboration with my colleagues we’ve already added more features supporting: GPU custom operators and partitioning with Gluon APIs. Heres a proof-of-concept example accelerator library available, but a real usable library is coming very soon…Theres still more on the way in the form of partitioning enhancements, and custom operator enhancements including sparse tensors. Stay tuned!

Leave a comment