mssplit (Measurement Splitting/Averaging Utility)

The mssplit utility is used to extract a subset of a measurement set. This subset may be a channel, beam id or scan id selection. It also has the ability to average channels together while doing so. It can also be used simply to average channels; i.e. just channel averaging, no filtering/selection.

The intended use-cases of this tool are:

  • Split a large measurement with many spectral channels into many smaller measurement sets, with perhaps a single spectral channel per file. This allows the MPI-based calibration and imaging programs to read a specific measurement set so no selection need be done.

  • Average a full spectral resolution measurement set down to fewer/wider channels for continuum imaging.

  • Perform frequency frame conversion “on the fly” while splitting or rebinning the data. This feature is only supported when the width parameter is 1.

Additionally, mssplit can filter based on the following criteria:

  • Scan Number

  • Beam Number

Running the program

It can be run with the following command, where “config.in” is a file containing the configuration parameters described in the next section.

$  mssplit -c config.in

The mssplit program is not parallel/distributed, it runs in a single process operating on a single input measurement set.

Configuration Parameters

Parameter

Default

Example

Description

vis

None

2013-12-25_230000.ms

The input measurement set (uv-dataset). This file will not be modified.

outputvis

None

chan_1.ms

The output measurement set (uv-dataset). This file will be created, and the program will fail to execute in the case a file/directory with the same name already exists.

channel

None

1-300

The channel range to split out into its own measurement set. Can be either a single integer (e.g. 1) or a range (e.g. 1-300). The range is inclusive of both the start and end, and indexing is one-based.

width

1

54

Defines the number of input channels to average together to form one output channel. As the averaged visiblities can have different noise levels due to flagging, an additional array column containing noise sigmas for each spectral channel will be written when width>1.

usemedian

false

true

When averaging (width>1) use the median instead of the mean. Note that this will increase the expected noise by 25%.

beams

None

[0] or [0, 1, 2] or [0..8]

Defines the beam numbers that will be exported to the output files. Rows are selected by matching their feed ID column with the provided beam number(s), so the numbers given here should be taken from the list of IDs in the FEED table of the measurement set. If this parameter is not set all beams are exported. The value may be a single integer (e.g. 0 or [0]), an array of integers such as [0,1,2] or a range such as [0..8].

scans

None

[0] or [0, 1, 2] or [0..2]

Defines the scan numbers that will be exported to the output files. Rows are selected by matching their scan_number column with the provided scan number(s). If this parameter is not set all scans are exported. The value may be a single integer (e.g. 0 or [0]), an array of integers such as [0,1,2] or a range such as [0..2].

fieldnames

None

[offset1] or [offset1,offset2] or [offset1..9]

Defines the field names that will be exported to the output files. If this parameter is not set all fields are exported. The value may be a single string (e.g. a0 or [a0]), an array of strings such as [a0,a1,a2] or a range such as [a0..2].

timebegin

None

1996/11/20/5:20 or 20Nov96-5h20m or 1996-11-20T5:20

Defines a time based filter. Any rows with time earlier than this parameter will be excluded during splitting (i.e. they will not be copied to the output measurement set. This parameter is optional and if not present there will be no later than filter applied.

timeend

None

1996/11/20/5:20 or 20Nov96-5h20m or 1996-11-20T5:20

Defines a time based filter. Any rows with time later than this parameter will be excluded during splitting (i.e. they will not be copied to the output measurement set. This parameter is optional and if not present there will be no earlier than filter applied.

dopplercorrection

false

true

To perform frequency frame correction, set this parameter to true. The parameter freqframe may also be required to be set. Note: doppler correction is only supported when the width parameter is 1

doppler_direction

None

[19h39m25.026, -63.42.45.63, J2000]

Reference direction used to calculate the doppler correction.

interpolation

nearest

cubic

Type of interpolation to use when performing doppler correction. One of linear, cubic, or nearest.

freqframe

topo

lsrk

The output frequency frame of the doppler correction. Supported options are [bary|lsrk]

Additional advanced/optional parameters:

Parameter

Default

Example

Description

stman.bucketsize

65536

Set the bucket size (in bytes) of the CASA Table storage manager. This usually translates into the I/O size.

stman.tilencorr

4

Set the number of correlations per tile. This affects the way the data is written to and read from disk.

stman.tilenchan

1

Set the number of spectral channels per tile. This affects the way the data is written to and read from disk. If it is expected that a given reader or writer process will read only a single channel then the default value of 1 is fine. If the reader or writer is expected to read many, or even all channels then a larger value would be more optimal.

stman.files

string

separate

This option controls how storage managers are mapped to files. The separate option (default) means every storage manager works with its own file. The combined option forces the storage managers to work with just one file. This can have benefits for the lustre filesystem. The hdf5 option is similar to combined, but writes corresponding data in the hdf5 format, provided casacore has been compiled with hdf5 support. The default option uses the default casacore constructor which forces the code to take these options from the resource file and in the absence of it, defaults to separate.

stman.blocksize

unsigned int

4194304

Block size in bytes for single file operations (i.e. for the combined and hdf5 options, see stman.files). It is not the same as bufferMB described below, this parameter is important when data from separate storage managers are aggregated together.

stman.odirect

boolean

false

Use ODirect o/s option, if supported, for the combined mode of storage manager operation (see above). Note, its use is discouraged.

bufferMB

4000

Set the size of the memory buffer in MB used for I/O. Up to twice this can be needed depending on the tile shapes. Setting this below 250 will makes mssplit run very slow and setting it bigger than the default has little benefit.

Configuration Example

Example 1

The following example demonstrates splitting out a single spectral channel, with no averaging:

# Input measurement set
# Default: <no default>
vis         = full.ms

# Output measurement set
# Default: <no default>
outputvis   = chan1.ms

# The channel range to split out into its own measurement set
# Can be either a single integer (e.g. 1) or a range (e.g. 1-300). The range
# is inclusive of both the start and end, indexing is one-based.
# Default: <no default>
channel     = 1

# Defines the number of channel to average to form the one output channel
# Default: 1
width       = 1

Example 2

The following example demonstrates both splitting and averaging. Here, the lowest numbered 54 channels are averaged together to form a single channel in the output measurement set.

# Input measurement set
# Default: <no default>
vis         = full-18_5kHz.ms

# Output measurement set
# Default: <no default>
outputvis   = averaged_1MHz_chan_1.ms

# The channel range to split out into its own measurement set
# Can be either a single integer (e.g. 1) or a range (e.g. 1-300). The range
# is inclusive of both the start and end, indexing is one-based.
# Default: <no default>
channel     = 1-54

# Defines the number of channel to average to form the one output channel
# Default: 1
width       = 54

Example 3

Finally, the following example demonstrates averaging a single measurement set with 16416 spectral channels by a factor of 54, creating a single output measurement set. i.e. 16416 x 18.5kHz channels to 304 x 1MHz channels.

# Input measurement set
# Default: <no default>
vis         = full-18_5kHz.ms

# Output measurement set
# Default: <no default>
outputvis   = averaged_1MHz.ms

# The channel range to split out into its own measurement set
# Can be either a single integer (e.g. 1) or a range (e.g. 1-300). The range
# is inclusive of both the start and end, indexing is one-based.
# Default: <no default>
channel     = 1-16416

# Defines the number of channel to average to form the one output channel
# Default: 1
width       = 54