IPython inline images and videos

Posted on

When working with large arrays of data, especially spatiotemporal data, visualization methods are needed. While the imshow function of scipy provides a backend-agnostic way of visualizing array data, and allows for all sorts of annotations, like axes labels, I found myself unhappy with the result more often than not.

There are two things that are difficult to get right when displaying image arrays in an IPython notebook: First, the figure size determines the maximum size of the image and cannot be adjusted dynamically by dragging its edges in the browser. Second, the architecture of matplotlib requires the image to undergo a set of transformations, and without obsessing about things like the figure resolution and interpolation methods, it’s very difficult to just plainly display an image in the browser without some form of resampling.


The following convenience function takes a two or three-dimensional array and displays it as an image in the notebook – directly as an image embedded in HTML, with automatic dynamic range adjustment, if desired, but no other transformations. It can be dynamically rescaled in the browser by dragging its edges. Here’s the source code:

def nbimage( data, vmin = None, vmax = None, vsym = False, saveas = None ):
    Display raw data as a notebook inline image.

    data: array-like object, two or three dimensions. If three dimensional,
          first or last dimension must have length 3 or 4 and will be
          interpreted as color (RGB or RGBA).
    vmin, vmax, vsym: refer to rerange()
    saveas: Save image file to disk (optional). Proper file name extension
            will be appended to the pathname given. [ None ]
    from IPython.display import display, Image
    from PIL.Image import fromarray
    from StringIO import StringIO
    data = rerange( data, vmin, vmax, vsym )
    data = data.squeeze()
    # try to be smart
    if data.ndim == 3 and 3 <= data.shape[ 0 ] <= 4:
        data = data.transpose( ( 1, 2, 0 ) )
    s = StringIO()
    fromarray( data ).save( s, 'png' )
    if saveas is not None:
        open( saveas + '.png', 'wb' ).write( s )
    display( Image( s.getvalue() ) )

The data must be either two-dimensional, in which case it is interpreted as a grayscale image; or three-dimensional, with one dimension having either 3 or 4 elements, corresponding to RGB images without or with an alpha channel, respectively. The code tries to guess whether the color channels are in the first or the last dimension (“planar” or “interleaved” format) and shuffles the dimensions accordingly.

Dynamic range adjustment

The dynamic range adjustment works as follows: First, if the array is already in 8-bit unsigned integer format, it is assumed that dynamic range adjustment is not necessary. If it has any other format, the default is to linearly scale the data such that it fills the 8-bit dynamic range of the image. The vmin and vmax parameters can be used to override the data values corresponding to the minimum and maximum pixel value. Any data values exceeding those limits are clipped to the maximum/minimum. The vsym parameter alters the default behavior in a way that I find often useful for displaying data like filter coefficients: It ensures that a data value of 0 will be mapped to mid-gray. All of this functionality is delegated to a separate function called rerange:

def rerange( data, vmin = None, vmax = None, vsym = False ):
    Rescale values of data array to fit the range 0 ... 255 and convert to uint8.

    data: array-like object. if data.dtype == uint8, no scaling will occur.
    vmin: original array value that will map to 0 in the output. [ data.min() ]
    vmax: original array value that will map to 255 in the output. [ data.max() ]
    vsym: ensure that 0 will map to gray (if True, may override either vmin or vmax
          to accommodate all values.) [ False ]
    from numpy import asarray, uint8, clip
    data = asarray( data )
    if data.dtype != uint8:
        if vmin is None:
            vmin = data.min()
        if vmax is None:
            vmax = data.max()
        if vsym:
            vmax = max( abs( vmin ), abs( vmax ) )
            vmin = -vmax
        data = ( data - vmin ) * ( 256 / ( vmax - vmin ) )
        data = clip( data, 0, 255 ).astype( uint8 )
    return data


Yes, that’s right! HTML5 nicely allows us to convert data arrays into inline browser videos. For this to work, we need a copy of ffmpeg that is accessible on the command line.

I wrote another function called nbvideo which works analogous to the nbimage function above. The dimensions of the array must now be 3 or 4, for grayscale or color images, respectively, and the first dimension is assumed to be time. The color dimension can either be the second or the last. It has some additional parameters: fps gives the number of frames per second, and loop is a boolean that, if set, puts a HTML attribute telling the browser to loop the playback by default (this can also be changed manually in the browser). Additionally, you can encode a frame counter into the video.

The parameters theora, h264, and vp8 can each be either set to an integer, indicating that the video will be compressed using the respective codec with that quality setting, or to None, indicating that codec should be turned off. Note that not all browsers support all codecs (have a look at this table). The default settings are a safe choice that should work in most browsers. If you know the preferred codec of your browser, you should turn one of the codecs off to save space (the default setting will produce both a theora and a H.264 version of the data).

nbvideo is quite lengthy, so I won’t include its source code here. Here‘s a python file with all of the functions.


Leave a Reply

Your email address will not be published. Required fields are marked *