mssplit (Measurement Splitting/Averaging Utility)¶

The mssplit utility is used to extract a subset of a measurement set. This subset may be a channel, beam id or scan id selection. It also has the ability to average channels together while doing so. It can also be used simply to average channels; i.e. just channel averaging, no filtering/selection.

The intended use-cases of this tool are:

Split a large measurement with many spectral channels into many smaller measurement sets, with perhaps a single spectral channel per file. This allows the MPI-based calibration and imaging programs to read a specific measurement set so no selection need be done.
Average a full spectral resolution measurement set down to fewer/wider channels for continuum imaging.
Perform frequency frame conversion “on the fly” while splitting or rebinning the data. This feature is only supported when the width parameter is 1.

Additionally, mssplit can filter based on the following criteria:

Scan Number
Beam Number

Running the program¶

It can be run with the following command, where “config.in” is a file containing the configuration parameters described in the next section.

$  mssplit -c config.in

The mssplit program is not parallel/distributed, it runs in a single process operating on a single input measurement set.

Configuration Parameters¶

Parameter	Default	Example	Description
vis	None	2013-12-25_230000.ms	The input measurement set (uv-dataset). This file will not be modified.
outputvis	None	chan_1.ms	The output measurement set (uv-dataset). This file will be created, and the program will fail to execute in the case a file/directory with the same name already exists.
correlations	all	parallel	The correlations to pass along, default is ‘all’, i.e., use the same as the input. ‘parallel’ will pass along XX, YY (or RR,LL) ‘StokesI’ will add the parallel correlations together for Stokes I visibility output. This can be used to save space for spectral data if polarisation is not of interest. Currently only implemented for width equals 1
channel	None	1-300	The channel range to split out into its own measurement set. Can be either a single integer (e.g. 1) or a range (e.g. 1-300). The range is inclusive of both the start and end, and indexing is one-based.
width	1	54	Defines the number of input channels to average together to form one output channel. As the averaged visiblities can have different noise levels due to flagging, an additional array column containing noise sigmas for each spectral channel will be written when width>1.
usemedian	false	true	When averaging (width>1) use the median instead of the mean. Note that this will increase the expected noise by 25%.
beams	None	[0] or [0, 1, 2] or [0..8]	Defines the beam numbers that will be exported to the output files. Rows are selected by matching their feed ID column with the provided beam number(s), so the numbers given here should be taken from the list of IDs in the FEED table of the measurement set. If this parameter is not set all beams are exported. The value may be a single integer (e.g. 0 or [0]), an array of integers such as [0,1,2] or a range such as [0..8].
scans	None	[0] or [0, 1, 2] or [0..2]	Defines the scan numbers that will be exported to the output files. Rows are selected by matching their scan_number column with the provided scan number(s). If this parameter is not set all scans are exported. The value may be a single integer (e.g. 0 or [0]), an array of integers such as [0,1,2] or a range such as [0..2].
fieldnames	None	[offset1] or [offset1,offset2] or [offset1..9]	Defines the field names that will be exported to the output files. If this parameter is not set all fields are exported. The value may be a single string (e.g. a0 or [a0]), an array of strings such as [a0,a1,a2] or a range such as [a0..2].
timebegin	None	1996/11/20/5:20 or 20Nov96-5h20m or 1996-11-20T5:20	Defines a time based filter. Any rows with time earlier than this parameter will be excluded during splitting (i.e. they will not be copied to the output measurement set. This parameter is optional and if not present there will be no later than filter applied.
timeend	None	1996/11/20/5:20 or 20Nov96-5h20m or 1996-11-20T5:20	Defines a time based filter. Any rows with time later than this parameter will be excluded during splitting (i.e. they will not be copied to the output measurement set. This parameter is optional and if not present there will be no earlier than filter applied.
dopplercorrection	false	true	To perform frequency frame correction, set this parameter to true. The parameter freqframe may also be required to be set. Note: doppler correction is only supported when the width parameter is 1
doppler_direction	None	[19h39m25.026, -63.42.45.63, J2000]	Reference direction used to calculate the doppler correction.
interpolation	nearest	cubic	Type of interpolation to use when performing doppler correction. One of linear, cubic, or nearest.
freqframe	topo	lsrk	The output frequency frame of the doppler correction. Supported options are [bary\|lsrk]

Additional advanced/optional parameters:¶

Parameter	Default	Example	Description
stman.bucketsize	65536		Set the bucket size (in bytes) of the CASA Table storage manager. This usually translates into the I/O size.
stman.tilencorr	4		Set the number of correlations per tile. This affects the way the data is written to and read from disk.
stman.tilenchan	1		Set the number of spectral channels per tile. This affects the way the data is written to and read from disk. If it is expected that a given reader or writer process will read only a single channel then the default value of 1 is fine. If the reader or writer is expected to read many, or even all channels then a larger value would be more optimal.
stman.files	string	separate	This option controls how storage managers are mapped to files. The separate option (default) means every storage manager works with its own file. The combined option forces the storage managers to work with just one file. This can have benefits for the lustre filesystem. The hdf5 option is similar to combined, but writes corresponding data in the hdf5 format, provided casacore has been compiled with hdf5 support. The default option uses the default casacore constructor which forces the code to take these options from the resource file and in the absence of it, defaults to separate.
stman.blocksize	unsigned int	4194304	Block size in bytes for single file operations (i.e. for the combined and hdf5 options, see stman.files). It is not the same as bufferMB described below, this parameter is important when data from separate storage managers are aggregated together.
stman.odirect	boolean	false	Use ODirect o/s option, if supported, for the combined mode of storage manager operation (see above). Note, its use is discouraged.
bufferMB	4000		Set the size of the memory buffer in MB used for I/O. Up to twice this can be needed depending on the tile shapes. Setting this below 250 will makes mssplit run very slow and setting it bigger than the default has little benefit.

Configuration Example¶

Example 1

The following example demonstrates splitting out a single spectral channel, with no averaging:

# Input measurement set
# Default: <no default>
vis         = full.ms

# Output measurement set
# Default: <no default>
outputvis   = chan1.ms

# The channel range to split out into its own measurement set
# Can be either a single integer (e.g. 1) or a range (e.g. 1-300). The range
# is inclusive of both the start and end, indexing is one-based.
# Default: <no default>
channel     = 1

# Defines the number of channel to average to form the one output channel
# Default: 1
width       = 1

Example 2

The following example demonstrates both splitting and averaging. Here, the lowest numbered 54 channels are averaged together to form a single channel in the output measurement set.

# Input measurement set
# Default: <no default>
vis         = full-18_5kHz.ms

# Output measurement set
# Default: <no default>
outputvis   = averaged_1MHz_chan_1.ms

# The channel range to split out into its own measurement set
# Can be either a single integer (e.g. 1) or a range (e.g. 1-300). The range
# is inclusive of both the start and end, indexing is one-based.
# Default: <no default>
channel     = 1-54

# Defines the number of channel to average to form the one output channel
# Default: 1
width       = 54

Example 3

Finally, the following example demonstrates averaging a single measurement set with 16416 spectral channels by a factor of 54, creating a single output measurement set. i.e. 16416 x 18.5kHz channels to 304 x 1MHz channels.

# Input measurement set
# Default: <no default>
vis         = full-18_5kHz.ms

# Output measurement set
# Default: <no default>
outputvis   = averaged_1MHz.ms

# The channel range to split out into its own measurement set
# Can be either a single integer (e.g. 1) or a range (e.g. 1-300). The range
# is inclusive of both the start and end, indexing is one-based.
# Default: <no default>
channel     = 1-16416

# Defines the number of channel to average to form the one output channel
# Default: 1
width       = 54

mssplit (Measurement Splitting/Averaging Utility)¶

Running the program¶

Configuration Parameters¶

Additional advanced/optional parameters:¶

Configuration Example¶

Table of Contents

Previous topic

Next topic

This Page