Bitmasks

PypeIt has implemented a generalized class for handling bitmasks.

Bitmasks allow you to define a set of bit values signified by strings, and then toggle and interpret bits held by a numpy.ndarray. For example, say you’re processing an image and you want to setup a set of bits that indicate that the pixel is part of a bad-pixel mask, has a cosmic ray, or is saturated. You can define the following:

from pypeit.bitmask import BitMask

bits = {'BPM':'Pixel is part of a bad-pixel mask',
        'COSMIC':'Pixel is contaminated by a cosmic ray',
        'SATURATED':'Pixel is saturated.'}
image_bm = BitMask(list(bits.keys()), descr=list(bits.values()))

Note

Consistency in the order of the dictionary keywords is critical to the repeatability of the BitMask instances. The above is possible because dict objects automatically maintain the order of the provided keywords since Python 3.7.

Or, better yet, define a derived class:

from pypeit.bitmask import BitMask

class ImageBitMask(BitMask):
    def __init__(self):
        bits = {'BPM':'Pixel is part of a bad-pixel mask',
                'COSMIC':'Pixel is contaminated by a cosmic ray',
                'SATURATED':'Pixel is saturated.'}
        super(ImageBitMask, self).__init__(list(bits.keys()), descr=list(bits.values()))

image_bm = ImageBitMask()

In either case, you can see the list of bits and their bit numbers by running:

>>> image_bm.info()
         Bit: BPM = 0
 Description: Pixel is part of a bad-pixel mask

         Bit: COSMIC = 1
 Description: Pixel is contaminated by a cosmic ray

         Bit: SATURATED = 2
 Description: Pixel is saturated.
>>> image_bm.bits
{'BPM': 0, 'COSMIC': 1, 'SATURATED': 2}
>>> image_bm.keys()
['BPM', 'COSMIC', 'SATURATED']

Now you can define a numpy.ndarray to hold the mask value for each image pixel; the minimum_dtype() returns the smallest data type required to represent the list of defined bits. The maximum number of bits that can be defined is 64. Assuming you have an image img:

import numpy
mask = numpy.zeros(img.shape, dtype=image_bm.minimum_dtype())

Assuming you have boolean or integer arrays that identify pixels to mask, you can turn on the mask bits as follows:

mask[cosmics_indx] = image_bm.turn_on(mask[cosmics_indx], 'COSMIC')
mask[saturated_indx] = image_bm.turn_on(mask[saturated_indx], 'SATURATED')

or make sure certain bits are off:

mask[not_a_cosmic] = image_bm.turn_off(mask[not_a_cosmic], 'COSMIC')

The form of these methods is such that the array passed to the method are not altered. Instead the altered bits are returned, which is why the lines above have the form m = bm.turn_on(m, flag).

Some other short usage examples:

To find which flags are set for a single value:
image_bm.flagged_bits(mask[0,10])
To find the list of unique flags set for any pixel:
unique_flags = numpy.sort(numpy.unique(numpy.concatenate(
                    [image_bm.flagged_bits(b) for b in numpy.unique(mask)]))).tolist()
To get a boolean array that selects pixels with one or more mask bits:
cosmics_indx = image_bm.flagged(mask, flag='COSMIC')
all_but_bpm_indx = image_bm.flagged(mask, flag=['COSMIC', 'SATURATED'])
any_flagged = image_bm.flagged(mask)
To construct masked arrays, following from the examples above:
masked_img = numpy.ma.MaskedArray(img, mask=image_bm.flagged(mask))

BitMask objects can be defined programmatically, as shown above for the ImageBitMask derived class, but they can also be defined by reading formatted files. The current options are:

Fits headers: There are both reading and writing methods for bitmask I/O using astropy.io.fits.Header objects. Using the ImageBitMask class as an example:

>>> from astropy.io import fits
>>> hdr = fits.Header()
>>> image_bm = ImageBitMask()
>>> image_bm.to_header(hdr)
>>> hdr
BIT0    = 'BPM     '           / Pixel is part of a bad-pixel mask
BIT1    = 'COSMIC  '           / Pixel is contaminated by a cosmic ray
BIT2    = 'SATURATED'          / Pixel is saturated.
>>> copy_bm = BitMask.from_header(hdr)

Bitmask Arrays

BitMaskArray objects combine the numpy.ndarray that holds the bit values and the BitMask object used to interpret them into a new subclass of DataContainer. The following builds on the example uses of BitMask objects (see Bitmasks).

Defining a new subclass

Given the definition of ImageBitMask, we can implement a relevant BitMaskArray subclass as follows:

from pypeit.images.bitmaskarray import BitMaskArray

class ImageBitMaskArray(BitMaskArray):
    version = '1.0'
    bitmask = ImageBitMask()

The remaining functionality below is all handled by the base class.

To instantiate a new 2D mask array that is 5 pixels on a side:

shape = (5,5)
mask = ImageBitMaskArray(shape)

You can access the bit flag names using:

>>> mask.bit_keys()
['BPM', 'COSMIC', 'SATURATED']

Bit access

You can flag bits using turn_on(). For example, the following code flags the center column of the image as being part of the detector bad-pixel mask:

import numpy as np
mask.turn_on('BPM', select=np.s_[:,2])

The select argument to turn_on() can be anything that is appropriately interpreted as slicing a numpy.ndarray. That is, arr[select] must be valid, where arr is the internal array held by mask.

Similarly, you can flag a pixel with a cosmic ray:

mask.turn_on('COSMIC', select=(0,0))

or multiple pixels as being saturated:

mask.turn_on('SATURATED', select=([0,1,-1,-1],[0,0,-1,-2]))

and you can simultaneously flag pixels for multiple reasons:

mask.turn_on(['COSMIC', 'SATURATED'], select=([-1,-1],[0,1]))

The mask values themselves are accessed using the mask attribute:

>>> mask.mask
array([[6, 0, 1, 0, 0],
       [4, 0, 1, 0, 0],
       [0, 0, 1, 0, 0],
       [0, 0, 1, 0, 0],
       [6, 6, 1, 4, 4]], dtype=int16)

However, more usefully, you can obtain boolean arrays that select pixels flagged by one or more flags:

>>> mask.flagged(flag='SATURATED')
array([[ True, False, False, False, False],
       [ True, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [ True,  True, False,  True,  True]])

>>> mask.flagged(flag=['BPM', 'SATURATED'])
array([[ True, False,  True, False, False],
       [ True, False,  True, False, False],
       [False, False,  True, False, False],
       [False, False,  True, False, False],
       [ True,  True,  True,  True,  True]])

If you want to select all pixels that are not flagged by a given flag, you can use the invert option in flagged():

>>> gpm = mask.flagged(flag='BPM', invert=True)
>>> gpm
array([[ True,  True, False,  True,  True],
       [ True,  True, False,  True,  True],
       [ True,  True, False,  True,  True],
       [ True,  True, False,  True,  True],
       [ True,  True, False,  True,  True]])

For individual flags, there is also convenience functionality that allows you to access a boolean array as if it were an attribute of the object:

>>> mask.bpm
array([[False, False,  True, False, False],
       [False, False,  True, False, False],
       [False, False,  True, False, False],
       [False, False,  True, False, False],
       [False, False,  True, False, False]])
>>> mask.saturated
array([[ True, False, False, False, False],
       [ True, False, False, False, False],
       [False, False, False, False, False],
       [False, False, False, False, False],
       [ True,  True, False,  True,  True]])

This convenience operation is identical to calling flagged() for the indicated bit. However bpm is not an array that can be used to change the value of the bits themselves:

>>> indx = np.zeros(shape, dtype=bool)
>>> indx[2,3] = True
>>> mask.bpm = indx # Throws an AttributeError

Instead, you must use the bit toggling functions provided by the class: turn_on(), turn_off(), or toggle().

Tip

Every time flagged() is called, a new array is created. If you need to access to the result of the function multiple times without changing the flags, you’re better of assigning the result to a new array and then using that array so that you’re not continually allocating and deallocating memory (even within the context of how this is done within python).

Input/Output

As a subclass of DataContainer, you can save and read the bitmask data to/from files:

>>> mask.to_file('mask.fits')
>>> _mask = ImageBitMaskArray.from_file('mask.fits')
>>> np.array_equal(mask.mask, _mask.mask)
True

In addition to the mask data, the bit flags and values are also written to the header; see the BIT* entries in the header below:

>>> from astropy.io import fits
>>> hdu = fits.open('mask.fits')
>>> hdu.info()
Filename: mask.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU      13   ()
  1  MASK          1 ImageHDU        22   (5, 5)   int16
>>> hdu['MASK'].header
XTENSION= 'IMAGE   '           / Image extension
BITPIX  =                   16 / array data type
NAXIS   =                    2 / number of array dimensions
NAXIS1  =                    5
NAXIS2  =                    5
PCOUNT  =                    0 / number of parameters
GCOUNT  =                    1 / number of groups
VERSPYT = '3.9.13  '           / Python version
VERSNPY = '1.22.3  '           / Numpy version
VERSSCI = '1.8.0   '           / Scipy version
VERSAST = '5.0.4   '           / Astropy version
VERSSKL = '1.0.2   '           / Scikit-learn version
VERSPYP = '1.10.1.dev260+g32de3d6d4' / PypeIt version
DATE    = '2022-11-10'         / UTC date created
DMODCLS = 'ImageBitMaskArray'  / Datamodel class
DMODVER = '1.0     '           / Datamodel version
BIT0    = 'BPM     '
BIT1    = 'COSMIC  '
BIT2    = 'SATURATED'
EXTNAME = 'MASK    '           / extension name
CHECKSUM= 'APGODMFOAMFOAMFO'   / HDU checksum updated 2022-11-10T13:10:27
DATASUM = '1245200 '           / data unit checksum updated 2022-11-10T13:10:27

Note

Currently, when loading a mask, the bit names in the header of the output file are not checked against the bitmask definition in the code itself. This kind of version control should be handled using the version attribute of the class. I.e., anytime the flags in the bitmask are changed, the developers should bump the class version.