mssplit (Measurement Splitting/Averaging Utility)
=================================================

The *mssplit* utility is used to extract a subset of a measurement set. This
subset may be a channel, beam id or scan id selection. It also has the ability
to average channels together while doing so. It can also be used simply to
average channels; i.e. just channel averaging, no filtering/selection.

The intended use-cases of this tool are:

- Split a large measurement with many spectral channels into many smaller
  measurement sets, with perhaps a single spectral channel per file. This
  allows the MPI-based calibration and imaging programs to read a specific
  measurement set so no selection need be done.

- Average a full spectral resolution measurement set down to fewer/wider
  channels for continuum imaging.

- Perform frequency frame conversion "on the fly" while splitting or rebinning the data.
  This feature is only supported when the width parameter is 1.

Additionally, *mssplit* can filter based on the following criteria:

- Scan Number
- Beam Number

Running the program
-------------------

It can be run with the following command, where "config.in" is a file containing
the configuration parameters described in the next section. ::

   $  mssplit -c config.in

The *mssplit* program is not parallel/distributed, it runs in a single process operating
on a single input measurement set.

Configuration Parameters
------------------------

+----------------------+------------+-----------------------+---------------------------------------------+
|**Parameter**         |**Default** |**Example**            |**Description**                              |
+======================+============+=======================+=============================================+
|vis                   |*None*      |2013-12-25_230000.ms   |The input measurement set (uv-dataset). This |
|                      |            |                       |file will not be modified.                   |
|                      |            |                       |                                             |
+----------------------+------------+-----------------------+---------------------------------------------+
|outputvis             |*None*      |chan_1.ms              |The output measurement set (uv-dataset). This|
|                      |            |                       |file will be created, and the program will   |
|                      |            |                       |fail to execute in the case a file/directory |
|                      |            |                       |with the same name already exists.           |
|                      |            |                       |                                             |
+----------------------+------------+-----------------------+---------------------------------------------+
|channel               |*None*      |1-300                  |The channel range to split out into its own  |
|                      |            |                       |measurement set. Can be either a single      |
|                      |            |                       |integer (e.g. 1) or a range (e.g. 1-300). The|
|                      |            |                       |range is inclusive of both the start and end,|
|                      |            |                       |and indexing is one-based.                   |
+----------------------+------------+-----------------------+---------------------------------------------+
|width                 |1           |54                     |Defines the number of input channels to      |
|                      |            |                       |average together to form one output channel. |
|                      |            |                       |As the averaged visiblities can have         |
|                      |            |                       |different noise levels due to flagging,      |
|                      |            |                       |an additional array column containing noise  |
|                      |            |                       |sigmas for each spectral channel will be     |
|                      |            |                       |written when width>1.                        |
+----------------------+------------+-----------------------+---------------------------------------------+
|usemedian             |false       |true                   |When averaging (width>1) use the median      |
|                      |            |                       |instead of the mean. Note that this will     |
|                      |            |                       |increase the expected noise by 25%.          |
+----------------------+------------+-----------------------+---------------------------------------------+
|beams                 |*None*      |[0]                    |Defines the beam numbers that will be        |
|                      |            |or                     |exported to the output files. Rows are       |
|                      |            |[0, 1, 2]              |selected by matching their feed ID column    |
|                      |            |or                     |with the provided beam number(s), so the     |
|                      |            |[0..8]                 |numbers given here should be taken from the  |
|                      |            |                       |list of IDs in the FEED table of the         |
|                      |            |                       |measurement set. If this parameter is not set|
|                      |            |                       |all beams are exported.  The value may be a  |
|                      |            |                       |single integer (e.g. 0 or [0]), an array of  |
|                      |            |                       |integers such as [0,1,2] or a range such as  |
|                      |            |                       |[0..8].                                      |
+----------------------+------------+-----------------------+---------------------------------------------+
|scans                 |*None*      |[0]                    |Defines the scan numbers that will be        |
|                      |            |or                     |exported to the output files. Rows are       |
|                      |            |[0, 1, 2]              |selected by matching their scan_number column|
|                      |            |or                     |with the provided scan number(s). If this    |
|                      |            |[0..2]                 |parameter is not set all scans are exported. |
|                      |            |                       |The value may be a single integer (e.g. 0 or |
|                      |            |                       |[0]), an array of integers such as [0,1,2] or|
|                      |            |                       |a range such as [0..2].                      |
+----------------------+------------+-----------------------+---------------------------------------------+
|fieldnames            |*None*      |[offset1]              |Defines the field names that will be         |
|                      |            |or                     |exported to the output files. If this        |
|                      |            |[offset1,offset2]      |parameter is not set all fields are exported.|
|                      |            |or                     |The value may be a single string (e.g. a0 or |
|                      |            |[offset1..9]           |[a0]), an array of strings such as [a0,a1,a2]|
|                      |            |                       |or a range such as [a0..2].                  |
+----------------------+------------+-----------------------+---------------------------------------------+
|timebegin             |*None*      |1996/11/20/5:20        |Defines a time based filter. Any rows with   |
|                      |            |or                     |time *earlier than* this parameter will be   |
|                      |            |20Nov96-5h20m          |excluded during splitting (i.e. they will    |
|                      |            |or                     |not be copied to the output measurement set. |
|                      |            |1996-11-20T5:20        |This parameter is optional and if not present|
|                      |            |                       |there will be no *later than* filter applied.|
+----------------------+------------+-----------------------+---------------------------------------------+
|timeend               |*None*      |1996/11/20/5:20        |Defines a time based filter. Any rows with   |
|                      |            |or                     |time *later than* this parameter will be     |
|                      |            |20Nov96-5h20m          |excluded during splitting (i.e. they will    |
|                      |            |or                     |not be copied to the output measurement set. |
|                      |            |1996-11-20T5:20        |This parameter is optional and if not present|
|                      |            |                       |there will be no *earlier than* filter       |
|                      |            |                       |applied.                                     |
+----------------------+------------+-----------------------+---------------------------------------------+
|dopplercorrection     |false       |true                   |To perform frequency frame correction, set   |
|                      |            |                       |this parameter to true. The parameter        |
|                      |            |                       |freqframe may also be required to be set.    |
|                      |            |                       |Note: doppler correction is only supported   |
|                      |            |                       |when the width parameter is 1                |
+----------------------+------------+-----------------------+---------------------------------------------+
|doppler_direction     |*None*      |[19h39m25.026,         |Reference direction used to calculate the    |
|                      |            |-63.42.45.63, J2000]   |doppler correction.                          |
+----------------------+------------+-----------------------+---------------------------------------------+
|interpolation         |nearest     |cubic                  |Type of interpolation to use when performing |
|                      |            |                       |doppler correction. One of linear, cubic, or |
|                      |            |                       |nearest.                                     |
+----------------------+------------+-----------------------+---------------------------------------------+
|freqframe             |topo        |lsrk                   |The output frequency frame of the doppler    |
|                      |            |                       |correction. Supported options are [bary|lsrk]|
+----------------------+------------+-----------------------+---------------------------------------------+

Additional advanced/optional parameters:
````````````````````````````````````````

+----------------------+------------+-----------------------+---------------------------------------------+
|**Parameter**         |**Default** |**Example**            |**Description**                              |
+======================+============+=======================+=============================================+
|stman.bucketsize      |65536       |                       |Set the bucket size (in bytes) of the CASA   |
|                      |            |                       |Table storage manager. This usually          |
|                      |            |                       |translates into the I/O size.                |
+----------------------+------------+-----------------------+---------------------------------------------+
|stman.tilencorr       |4           |                       |Set the number of correlations per tile. This|
|                      |            |                       |affects the way the data is written to and   |
|                      |            |                       |read from disk.                              |
+----------------------+------------+-----------------------+---------------------------------------------+
|stman.tilenchan       |1           |                       |Set the number of spectral channels per tile.|
|                      |            |                       |This affects the way the data is written to  |
|                      |            |                       |and read from disk. If it is expected that a |
|                      |            |                       |given reader or writer process will read only|
|                      |            |                       |a single channel then the default value of 1 |
|                      |            |                       |is fine. If the reader or writer is expected |
|                      |            |                       |to read many, or even all channels then a    |
|                      |            |                       |larger value would be more optimal.          |
+----------------------+------------+-----------------------+---------------------------------------------+
|stman.files           |string      |separate               |This option controls how storage managers are|
|                      |            |                       |mapped to files. The **separate** option     |
|                      |            |                       |(default) means every storage manager works  |
|                      |            |                       |with its own file. The **combined** option   |
|                      |            |                       |forces the storage managers to work with     |
|                      |            |                       |just one file. This can have benefits for the|
|                      |            |                       |lustre filesystem. The **hdf5** option is    |
|                      |            |                       |similar to **combined**, but writes          |
|                      |            |                       |corresponding data in the hdf5 format,       |
|                      |            |                       |provided casacore has been compiled with hdf5|
|                      |            |                       |support. The **default** option uses the     |
|                      |            |                       |default casacore constructor which forces the|
|                      |            |                       |code to take these options from  the         |
|                      |            |                       |resource file and in the absence of it,      |
|                      |            |                       |defaults to **separate**.                    |
+----------------------+------------+-----------------------+---------------------------------------------+
|stman.blocksize       |unsigned int|4194304                |Block size in bytes for single file          |
|                      |            |                       |operations (i.e. for the **combined** and    |
|                      |            |                       |**hdf5** options, see **stman.files**). It is|
|                      |            |                       |not the same as **bufferMB** described below,|
|                      |            |                       |this parameter is important when data from   |
|                      |            |                       |separate storage managers are aggregated     |
|                      |            |                       |together.                                    |
+----------------------+------------+-----------------------+---------------------------------------------+
|stman.odirect         |boolean     |false                  |Use ODirect o/s option, if supported, for the|
|                      |            |                       |**combined** mode of storage manager         |
|                      |            |                       |operation (see above). Note, its use  is     |
|                      |            |                       |discouraged.                                 |
+----------------------+------------+-----------------------+---------------------------------------------+
|bufferMB              |4000        |                       |Set the size of the memory buffer in MB used |
|                      |            |                       |for I/O. Up to twice this can be needed      |
|                      |            |                       |depending on the tile shapes.                |
|                      |            |                       |Setting this below 250 will makes mssplit run|
|                      |            |                       |very slow and setting it bigger than the     |
|                      |            |                       |default has little benefit.                  |
+----------------------+------------+-----------------------+---------------------------------------------+

Configuration Example
---------------------

**Example 1**

The following example demonstrates splitting out a single spectral channel,
with no averaging:

.. code-block:: bash

    # Input measurement set
    # Default: <no default>
    vis         = full.ms

    # Output measurement set
    # Default: <no default>
    outputvis   = chan1.ms

    # The channel range to split out into its own measurement set
    # Can be either a single integer (e.g. 1) or a range (e.g. 1-300). The range
    # is inclusive of both the start and end, indexing is one-based.
    # Default: <no default>
    channel     = 1

    # Defines the number of channel to average to form the one output channel
    # Default: 1
    width       = 1


**Example 2**

The following example demonstrates both splitting and averaging. Here, the lowest
numbered 54 channels are averaged together to form a single channel in the output
measurement set.

.. code-block:: bash

    # Input measurement set
    # Default: <no default>
    vis         = full-18_5kHz.ms

    # Output measurement set
    # Default: <no default>
    outputvis   = averaged_1MHz_chan_1.ms

    # The channel range to split out into its own measurement set
    # Can be either a single integer (e.g. 1) or a range (e.g. 1-300). The range
    # is inclusive of both the start and end, indexing is one-based.
    # Default: <no default>
    channel     = 1-54

    # Defines the number of channel to average to form the one output channel
    # Default: 1
    width       = 54


**Example 3**

Finally, the following example demonstrates averaging a single measurement set
with 16416 spectral channels by a factor of 54, creating a single output
measurement set. i.e. 16416 x 18.5kHz channels to 304 x 1MHz channels.

.. code-block:: bash

    # Input measurement set
    # Default: <no default>
    vis         = full-18_5kHz.ms

    # Output measurement set
    # Default: <no default>
    outputvis   = averaged_1MHz.ms

    # The channel range to split out into its own measurement set
    # Can be either a single integer (e.g. 1) or a range (e.g. 1-300). The range
    # is inclusive of both the start and end, indexing is one-based.
    # Default: <no default>
    channel     = 1-16416

    # Defines the number of channel to average to form the one output channel
    # Default: 1
    width       = 54