Back to basics: Linear Algebra – Calculating the diagonal of a multi-dimensional matrix

Recently I had the need to calculate the diagonal of a matrix. Initially I thought, wow, here we go again:

 
import numpy as np

def diag(a):
   ret = np.zeros(len(a))
   for i in range(len(a)):
      ret[i] = a[i][i]
   return ret

And initially this worked. But then I realized that it only worked for square 2D matrices. Not only do I need it to work for arbitrary N-dimensional matrices (commonly called “tensors” nowadays), I need it to work with non-square tensors too.

So I did some searching and couldnt find anything online about how to calculate the diagonal for even a 3D tensor. So I went to where I knew there was an implementation and looked at the Numpy source code. This wasnt exactly straightforward since it was designed for Numpy ndarrays, already parallelized and optimized for Numpy, and didn’t actually return the diagonal in a separate array. Rather it returned a “view” of the original array without modification. This just means that the raw data is shared. So now I have to do two things to compute the diagonal: (1) determine the “diagonal view” of the original data, and (2) copy the data into a new array.

So lets say we have this 3D array:

import numpy as np

a = np.linspace(1,27,27).reshape(3,3,3)

[[[ 1,  2,  3],
  [ 4,  5,  6],
  [ 7,  8,  9]],

 [[10, 11, 12],
  [13, 14, 15],
  [16, 17, 18]],

 [[19, 20, 21],
  [22, 23, 24],
  [25, 26, 27]]]

The diagonal of this 3D array will be:

a.diagonal()

[[ 1, 13, 25],
 [ 2, 14, 26],
 [ 3, 15, 27]]

Notice that the diagonal result is also a 3×3 matrix. You can think of the diagonal of an N-dimension matrix as a slice through the matrix. The figure below shows a visual view of the original 3D matrix and the resulting “slice” through the matrix for the diagonal. Notice that the result is still a “diagonal” along one dimension, but its a “plane” rather than a 1-D array.

First lets start with computing the resulting diagonal’s shape along the axes: axis0 and axis1 (and default them to 0 and 1)

axis0 = 0
axis1 = 1
ndim = len(a.shape)
diag_size = min(a.shape[axis0],a.shape[axis1])
diag_shape = [0] * (ndim-1)
idx = 0
for i in range(ndim):
   if i != axis0 and i != axis1:
      diag_shape[idx] = a.shape[i]
      idx += 1
diag_shape[ndim-2] = diag_size

[3, 3]

In the implementation above, we calculate the size of the diagonal as the minimum of the length of the two axes that were calculating the diagonal over. In a traditional 2D matrix, the diagonal will be whichever of the sides is shorter. If you have a (2,3) matrix then the diagonal will be 2 elements. But if you have a (4,3) matrix the diagonal will have 3 elements. The same applies in N-dimensional space. That for loop above is basically creating a list of axis sizes for all axes that are not on the diagonal. For example a (1,2,3,4) 4D matrix, will have (3,4) in the list at the end of the for loop since we are calculating the diagonal along axes 0 and 1 (default values). Then at the end we add the last dimension which is the length of the diagonal. In our 4D example, this would be min(1,2) so our final shape is (3,4,1).

So now that we know our original matrix shape and the shape that the diagonal will be, we need to find a mapping from the original data layout to the diagonal layout. Calculating the diagonal is not computationally intensive, its really just a complex indexing calculation. If we look at the resulting diagonal values: 1, 13, 25, 2, 14, … you might be able to notice a pattern. 1 + 12 = 13, 13 + 12 = 25, 1 + 1 = 2, 2 + 12 = 14, and so on. So as we go across the columns of the diagonal matrix we increment by 12. And as we go down the rows we increment by 1. These step sizes are called the “stride“. Typically, the stride is valued in number of bytes. But here we’re using it to represent whole words (or elements). We can calculate the diagonal stride as:

diag_stride = [0] * (ndim-1)
idx = 0
for i in range(ndim):
   if i != axis0 and i != axis1:
      diag_stride[idx] = a.strides[i] / a.dtype.itemsize
      idx += 1
diag_stride[ndim-2] = (a.strides[axis0] + a.strides[axis1]) / a.dtype.itemsize

[1, 12]

The code above indeed calculates the 1 and 12 increments that we observed earlier. If we look at the stride of our input matrix we’ll find:

np.divide(a.strides,a.dtype.itemsize)

[9, 3, 1]

This makes sense, since to go from one element in a column to the next one in the same row, we increment the index by 1. And if we go between rows, then we add 3 (since there are 3 elements in a row). And if we go between each of the 3 slices of 9 elements each we have to increment by 9.

Now that we have the shape of the output diagonal we can construct the matrix to hold the resulting values and call a function to copy the values from the input matrix to the output matrix along the strides we calculated earlier.

ret = np.zeros(ret_shape,dtype=arr.dtype)

copy(a,ret,diag_strides)

So now the elephant in the room……………the copy ( ) function

To copy the values from the input matrix to the output, we need to walk along the diagonal and copy each value. The easiest way to do this is to loop over all the indices in the output and create a mapping from that index to the index in the input matrix. To do this in a general way we can only loop over the 1D indices in the output matrix.

Then for each 1D index, we need to convert that to the N-dimensional coordinates (ie. [0,0]) and apply the diagonal stride to compute the index in the input matrix. And lastly we convert the 1D index in the input matrix to its corresponding N-dimensional coordinate to index in the input matrix (in python at least, if we do this in C/C++ we can just use the 1D index into the contiguous memory array storing the data).

Heres some examples going from a 1D index to N-dimensional coordinate for a 3×3 matrix:

[[ 1, 13, 25],
 [ 2, 14, 26],
 [ 3, 15, 27]]

0: (0,0) ==> 1
1: (0,1) ==> 13
2: (0,2) ==> 25
3: (1,0) ==> 2
4: (1,1) ==> 14
...
8: (2,2) ==> 27

This is very similar to converting between decimal and binary. For index 4 we have to compute: 4/3 = 1, 4%3 = 1 so our coordinates are (1,1). Heres a generic way to compute this:

def unravel(idx,stride):
   ret = [0] * len(stride)
   for i in range(len(stride)):
      ret[i] = idx / stride[i]
      idx -= ret[i] * stride[i]
   return tuple(ret)

In this code above, we loop over each dimension and divide the index by the stride for that dimension and that is the index in that dimension. Then we subtract this quantity from the index to get the remainder and do this again for the next dimension with the remainder. And so on until we finish handling all the dimensions. Finally we return the indices as a tuple (python syntax).

Then if we want to convert a set of N-dimensional coordinates back to a 1D index we can do this by:

def ravel(coord,stride):
   idx = 0
   for i in range(len(coord)):
      idx += coord[i] * stride[i]
   return idx

Again this is similar to converting from binary back to decimal were we loop over each coordinate/dimension and multiply by the stride and sum them all.

Finally using these two functions we can convert from output index (b_idx) to output coordinates (b_coord), then to input index (a_idx) and finally input coordinates (a_coord). The function below does just that:

                                                                  
def copy(a,b,diag_stride):                                                                   
    for b_idx in range(b.size):                                                                   
        b_coord = unravel(b_idx,b.stride)
        a_idx = ravel(b_coord,diag_stride)
        a_coord = unravel(a_idx,a.stride)                                                                                           
        b[b_coord] = a[a_coord]

Basically, this function loops over the indices of the diagonal and converts the indexing to the input matrix and then copies each value to the output matrix.

The last feature needed in computing the diagonal is allowing the user to choose exactly which diagonal they want. Wait! Theres more than one diagonal in a matrix!? Yes! There are, here are all of the diagonals identified in a (3,3) matrix:

Each diagonal is identified by the “k” value, with the default diagonal being “k = 0”. As you increase/decrease the k value you slide the diagonal along the axes you’re calculating the diagonal over. To add additional support for this we need to incorporate the k value and offset computation into our implementation. First, the offset is calculated based upon whether the k value is positive or negative:

  
if k > 0:
   offset = a.strides[axis1] * k
   height = arr.shape[a0]
   width = arr.shape[a1] - k
elif k < 0:
   offset = a.strides[axis0] * -k
   height = arr.shape[a0] + k
   width = arr.shape[a1]
else:
   offset = 0
   height = arr.shape[a0]
   width = arr.shape[a1]

Notice that if k > 0 then we are modifying the width of the diagonal (axis1) and if k < 0 then we are modifying the height (axis0). Now we just need to pass the offset we calculated into the copy function.

 
copy(arr,ret,ret_strides,offset)

Remember that this offset is just for indexing into the original input matrix. So inside the copy function the only modification is to offset the index for the input matrix (a_idx):

 
def copy(a,b,diag_stride,offset):
   for b_idx in range(b.size):
      b_coord = unravel(b_idx,b.stride) 
      a_idx = offset + ravel(b_coord,diag_stride)
      a_coord = unravel(a_idx,a.stride)
      b[b_coord] = a[a_coord]

All of the code snippets can be found here in a single file. Good luck!

Leave a comment