Null value proposal

Date: 2018-05-05

I want to support NULL values in fitsio. It is fully supported in cfitsio and is incredibly useful functionality.

It allows the user of the library to know what values are NULL rather than assuming, or having to deal with NaN values. Additionally, integers have no meaningful NULL value so capturing the meaning behind a value is not possible.

Naturally, Rust represents the concept of NULL with the std::option::Option type, rather than a custom value that type-checks as any other type, and can make it's way through code unhandled. Other people have made better arguments for why an [Option] type is a sensible alternative.

Solution

One option is for fitsio to return a Vec<Option<T>> however this causes non-contiguous memory layout and inflates the size of the vector (see this discussion).

Another option is to create a new type, which contains the contiguous Vec, and a array storing whether that element is null or not.

A naive implementation looks like:


# #![allow(unused_variables)]
#fn main() {
struct NullVec<T> {
    contents: Vec<T>,
    is_null: Vec<bool>,
}
#}

This can be improved by switching out the is_null member for a bitvec:


# #![allow(unused_variables)]
#fn main() {
struct NullVec<T> {
    contents: Vec<T>,
    is_null: BitVec,
}
#}

This solution gets as as far as storing the actual data, but how to return it from read_* methods? I see two alternatives:

Integration with fitsio A

This data type must be compatible with all of the read methods. It is possible to write null values to a fits file, but the API is a little disgusting1. I will shelve writing null values for now.

This restriction means it has to integrate with ReadImage and ReadsCol.

In particular, it needs to integrate with ReadImage<Vec<T>> and ReadsCol<Vec<T>>. This means ReadsCol has to be re-created to work on Vec<T>.

After this has been done, then I can implement ReadImage/ReadsCol for NullVec<T> and handle the null values properly.

Hopefully this should not be too much work. The final result I want the end user to have is:


# #![allow(unused_variables)]
#fn main() {
let data: NullVec<f64> = hdu.read_col(&mut fptr, "DATA");
let first_value: Option<f64> = data[0];
#}

if they care about null values, and


# #![allow(unused_variables)]
#fn main() {
let data: Vec<f64> = hdu.read_col(&mut fptr, "DATA");
let first_value: f64 = data[0]; // NULL values be damned
#}

(i.e. the existing behaviour) if they don't.

I am not sure the std::ops::Index trait supports returning Option types however...

Integration with fitsio B

The alternative is to support a single return type, but implement a new trait that returns an Option value if the underlying data is not NULL. ReadImage and ReadsCol can then return implementors of this trait.

This trait will then have to be implemented for all of the available return types and is compatible with future ones.

I'll have more details in the future.

1

"substitute the appropriate FITS null value for all elements which are equal to the input value of nulval [..]. For integer columns the FITS null value is defined by the TNULLn keyword [..]. For floating point columns the special IEEE NaN (Not-a-Number) value will be written into the FITS file"