A Crash Course in MXNet Tensor Basics & Simple Automatic Differentiation

This is an overview of some basic functionality of the MXNet ndarray package for creating tensor-like objects, and using the autograd package for performing automatic differentiation.
By Matthew Mayo, KDnuggets.
c
comments

I originally intended to play around with MXNet long ago, around the time that Gluon was released publicly. Things got busy. I got sidetracked.

I finally started using MXNet recently. In the interests of getting to know my way around, I thought covering some basics, such as how tensors and derivatives are handled, might be a good place to start (such as I did here and here with PyTorch).

This won't repeat what is in those previous PyTorch articles step by step, so look at those if you want any further context. What's below should be relatively straightforward, however.

Image

MXNet is an open source neural network framework, a "flexible and efficient library for deep learning." Gluon is the imperative high-level API for MXNet, which provides additional flexibility and ease of use. You can think of the relationship between MXNet and Gluon as being similar to TensorFlow and Keras. We won't cover Gluon any further herein, but will explore it in future posts.

MXNet's tensor implementation comes in the form of the ndarray package. Here you will find what's needed to build multidimensional (n-dimensional) arrays and perform some of the operations on them required for implementing neural networks, along with the autograd package. It is this package we will make use of below.

 
ndarray (Very) Basics

First, let's import what we need from the library, in such a way as to simplify making our API calls:

import mxnet as mx
from mxnet import autograd as ag
from mxnet import nd

Now, let's create a basic ndarray (on the CPU):

# Create CPU array
a = nd.ones((3, 2))
print(a)
[[1. 1.]
 [1. 1.]
 [1. 1.]]
<NDArray 3x2 @cpu(0)>


Note that printing an ndarray also prints out the type of the object (again, NDArray), as well as its size and the device to which it is attached (in this case, CPU).

What if we wanted to create an ndarray object with a GPU context (note that a context is the device type and ID which should be used to perform operations on the object)? First, let's determine whether or not there is a GPU available to MXNet:

# Test if GPU is recognized
def gpu_device(gpu_number=0):
    try:
        _ = mx.nd.array([1, 2, 3], ctx=mx.gpu(gpu_number))
    except mx.MXNetError:
        return None
    return mx.gpu(gpu_number)

gpu_device()
gpu(0)


This response denotes that there is a GPU device, and its ID is 0.

Let's create an ndarray on this device:

# Create GPU array
b = nd.zeros((2, 2), ctx=mx.gpu())
print(b)
[[0. 0.]
 [0. 0.]]
<NDArray 2 x 2 @gpu(0)>


The output here confirms that an ndarray of zeros of size 2 x 2 was created with a context of GPU.

To get a returned transposed ndarray (as opposed to simply a transpose view of the original):

# Transpose
T = c.T
print(T)
[[1. 2. 3.]
 [4. 5. 6.]]
<NDArray 2x3 @cpu(0)>


Reshape an ndarray as a view, without alteration of the original data:

# Reshape
r = T.reshape(3,2)
print(r)
[[1. 2.]
 [3. 4.]
 [5. 6.]]
<NDArray 3x2 @cpu(0)>


Some ndarray info:

# ndarray info
print('ndarray shape:', r.shape)
print('Number of dimensions:', r.ndim)
print('ndarray type:', r.dtype)
ndarray shape: (3, 2)
Number of dimensions: 2
ndarray type: <class 'numpy.float32'>


See here for more on ndarray basics.

 
MXNet ndarray To and From Numpy ndarray

It's easy to go from Numpy ndarrays to MXNet ndarrays and vice versa.

import numpy as np

# To numpy ndarray
n = c.asnumpy()
print(n)
print(type(n))
[[1. 4.]
 [2. 5.]
 [3. 6.]]
<class 'numpy.ndarray'>


# From numpy ndarray
a = np.array([[1, 10], [2, 20], [3, 30]])
b = nd.array(a)
print(b)
print(type(b))
[[ 1. 10.]
 [ 2. 20.]
 [ 3. 30.]]

  
  
  
   
   
   
<class 'mxnet.ndarray.ndarray.NDArray'>
  
  
  


 
Matrix-matrix multiplication

Here's how to compute a matrix-matrix dot product:

# Compute dot product
t1 = nd.random.normal(-1, 1, shape=(3, 2))
t2 = nd.random.normal(-1, 1, shape=(2, 3))
t3 = nd.dot(t1, t2)
print(t3)
[[1.8671514 2.0258508 1.1915313]
 [9.009048  8.481084  6.7323728]
 [5.0241795 4.346245  4.0459785]]
<NDArray 3x3 @cpu(0)>


See here for more on linear algebra operations with ndarray.

 
Using autograd to Find and Solve a Derivative

On to solving a derivative with the MXNet autograd package for automatic differentiation.

First we will need a function for which to find the derivative. Arbitrarily, let's use this:

 

Equation

 

To see us work out the first order derivative of this function by hand, as well as find the value of our derivative function for a given value of x, see this post.

For reasons which should be obvious, we have to represent our function in Python as such:

y = 5*x**4 + 3*x**3 + 7*x**2 + 9*x - 5


Now let's find the value of our derivative function for a given value of x. Let's arbitrarily use 2:

x = nd.array([2])
x.attach_grad()

with ag.record():
  y = 5*x**4 + 3*x**3 + 7*x**2 + 9*x - 5

y.backward()
x.grad


Line by line, the above code:

  • defines the value (2) we want to compute the derivative with regard to as an MXNet ndarray object
  • uses attach_grad() to allocate space for the gradient to be computed
  • the code block denoted with ag.record() contains the computation to be performed with regard to computing and tracking the gradient
  • defines the function we want to compute the derivative of
  • uses autograd's backward() to compute the sum of gradients, using the chain rule
  • outputs the value stored in the x ndarray's grad attribute, which, as shown below
tensor([ 233.])


This value, 233, matches what we calculated by hand in this post.

See here for more on automatic differentiation with autograd.

 
This has been a very basic overview of simple ndarray operations and derivatives in MXNet. As these are 2 of the staples of building neural networks, this should provide some familiarity with the library's approaches to these basic buildings blocks, and allow for diving in to some more complex code. Next time we will create some simple neural networks with MXNet and Gluon, exploring the libraries more in-depth.

For more (right now!) on MXNet, Gluon, and deep learning in general, the freely-available book Deep Learning - The Straight Dope, written by those intimately involved in the development and evangelizing of these libraries, is definitely worth looking at.

 
Related:

  • PyTorch Tensor Basics
  • Simple Derivatives with PyTorch
  • WTF is a Tensor?!?