mssplit (Measurement Splitting/Averaging Utility) ================================================= The *mssplit* utility is used to extract a subset of a measurement set. This subset may be a channel, beam id or scan id selection. It also has the ability to average channels together while doing so. It can also be used simply to average channels; i.e. just channel averaging, no filtering/selection. The intended use-cases of this tool are: - Split a large measurement with many spectral channels into many smaller measurement sets, with perhaps a single spectral channel per file. This allows the MPI-based calibration and imaging programs to read a specific measurement set so no selection need be done. - Average a full spectral resolution measurement set down to fewer/wider channels for continuum imaging. - Perform frequency frame conversion "on the fly" while splitting or rebinning the data. This feature is only supported when the width parameter is 1. Additionally, *mssplit* can filter based on the following criteria: - Scan Number - Beam Number Running the program ------------------- It can be run with the following command, where "config.in" is a file containing the configuration parameters described in the next section. :: $ mssplit -c config.in The *mssplit* program is not parallel/distributed, it runs in a single process operating on a single input measurement set. Configuration Parameters ------------------------ +----------------------+------------+-----------------------+---------------------------------------------+ |**Parameter** |**Default** |**Example** |**Description** | +======================+============+=======================+=============================================+ |vis |*None* |2013-12-25_230000.ms |The input measurement set (uv-dataset). This | | | | |file will not be modified. | | | | | | +----------------------+------------+-----------------------+---------------------------------------------+ |outputvis |*None* |chan_1.ms |The output measurement set (uv-dataset). This| | | | |file will be created, and the program will | | | | |fail to execute in the case a file/directory | | | | |with the same name already exists. | | | | | | +----------------------+------------+-----------------------+---------------------------------------------+ |channel |*None* |1-300 |The channel range to split out into its own | | | | |measurement set. Can be either a single | | | | |integer (e.g. 1) or a range (e.g. 1-300). The| | | | |range is inclusive of both the start and end,| | | | |and indexing is one-based. | +----------------------+------------+-----------------------+---------------------------------------------+ |width |1 |54 |Defines the number of input channels to | | | | |average together to form one output channel. | | | | |As the averaged visiblities can have | | | | |different noise levels due to flagging, | | | | |an additional array column containing noise | | | | |sigmas for each spectral channel will be | | | | |written when width>1. | +----------------------+------------+-----------------------+---------------------------------------------+ |usemedian |false |true |When averaging (width>1) use the median | | | | |instead of the mean. Note that this will | | | | |increase the expected noise by 25%. | +----------------------+------------+-----------------------+---------------------------------------------+ |beams |*None* |[0] |Defines the beam numbers that will be | | | |or |exported to the output files. Rows are | | | |[0, 1, 2] |selected by matching their feed ID column | | | |or |with the provided beam number(s), so the | | | |[0..8] |numbers given here should be taken from the | | | | |list of IDs in the FEED table of the | | | | |measurement set. If this parameter is not set| | | | |all beams are exported. The value may be a | | | | |single integer (e.g. 0 or [0]), an array of | | | | |integers such as [0,1,2] or a range such as | | | | |[0..8]. | +----------------------+------------+-----------------------+---------------------------------------------+ |scans |*None* |[0] |Defines the scan numbers that will be | | | |or |exported to the output files. Rows are | | | |[0, 1, 2] |selected by matching their scan_number column| | | |or |with the provided scan number(s). If this | | | |[0..2] |parameter is not set all scans are exported. | | | | |The value may be a single integer (e.g. 0 or | | | | |[0]), an array of integers such as [0,1,2] or| | | | |a range such as [0..2]. | +----------------------+------------+-----------------------+---------------------------------------------+ |fieldnames |*None* |[offset1] |Defines the field names that will be | | | |or |exported to the output files. If this | | | |[offset1,offset2] |parameter is not set all fields are exported.| | | |or |The value may be a single string (e.g. a0 or | | | |[offset1..9] |[a0]), an array of strings such as [a0,a1,a2]| | | | |or a range such as [a0..2]. | +----------------------+------------+-----------------------+---------------------------------------------+ |timebegin |*None* |1996/11/20/5:20 |Defines a time based filter. Any rows with | | | |or |time *earlier than* this parameter will be | | | |20Nov96-5h20m |excluded during splitting (i.e. they will | | | |or |not be copied to the output measurement set. | | | |1996-11-20T5:20 |This parameter is optional and if not present| | | | |there will be no *later than* filter applied.| +----------------------+------------+-----------------------+---------------------------------------------+ |timeend |*None* |1996/11/20/5:20 |Defines a time based filter. Any rows with | | | |or |time *later than* this parameter will be | | | |20Nov96-5h20m |excluded during splitting (i.e. they will | | | |or |not be copied to the output measurement set. | | | |1996-11-20T5:20 |This parameter is optional and if not present| | | | |there will be no *earlier than* filter | | | | |applied. | +----------------------+------------+-----------------------+---------------------------------------------+ |dopplercorrection |false |true |To perform frequency frame correction, set | | | | |this parameter to true. The parameter | | | | |freqframe may also be required to be set. | | | | |Note: doppler correction is only supported | | | | |when the width parameter is 1 | +----------------------+------------+-----------------------+---------------------------------------------+ |doppler_direction |*None* |[19h39m25.026, |Reference direction used to calculate the | | | |-63.42.45.63, J2000] |doppler correction. | +----------------------+------------+-----------------------+---------------------------------------------+ |interpolation |nearest |cubic |Type of interpolation to use when performing | | | | |doppler correction. One of linear, cubic, or | | | | |nearest. | +----------------------+------------+-----------------------+---------------------------------------------+ |freqframe |topo |lsrk |The output frequency frame of the doppler | | | | |correction. Supported options are [bary|lsrk]| +----------------------+------------+-----------------------+---------------------------------------------+ Additional advanced/optional parameters: ```````````````````````````````````````` +----------------------+------------+-----------------------+---------------------------------------------+ |**Parameter** |**Default** |**Example** |**Description** | +======================+============+=======================+=============================================+ |stman.bucketsize |65536 | |Set the bucket size (in bytes) of the CASA | | | | |Table storage manager. This usually | | | | |translates into the I/O size. | +----------------------+------------+-----------------------+---------------------------------------------+ |stman.tilencorr |4 | |Set the number of correlations per tile. This| | | | |affects the way the data is written to and | | | | |read from disk. | +----------------------+------------+-----------------------+---------------------------------------------+ |stman.tilenchan |1 | |Set the number of spectral channels per tile.| | | | |This affects the way the data is written to | | | | |and read from disk. If it is expected that a | | | | |given reader or writer process will read only| | | | |a single channel then the default value of 1 | | | | |is fine. If the reader or writer is expected | | | | |to read many, or even all channels then a | | | | |larger value would be more optimal. | +----------------------+------------+-----------------------+---------------------------------------------+ |stman.files |string |separate |This option controls how storage managers are| | | | |mapped to files. The **separate** option | | | | |(default) means every storage manager works | | | | |with its own file. The **combined** option | | | | |forces the storage managers to work with | | | | |just one file. This can have benefits for the| | | | |lustre filesystem. The **hdf5** option is | | | | |similar to **combined**, but writes | | | | |corresponding data in the hdf5 format, | | | | |provided casacore has been compiled with hdf5| | | | |support. The **default** option uses the | | | | |default casacore constructor which forces the| | | | |code to take these options from the | | | | |resource file and in the absence of it, | | | | |defaults to **separate**. | +----------------------+------------+-----------------------+---------------------------------------------+ |stman.blocksize |unsigned int|4194304 |Block size in bytes for single file | | | | |operations (i.e. for the **combined** and | | | | |**hdf5** options, see **stman.files**). It is| | | | |not the same as **bufferMB** described below,| | | | |this parameter is important when data from | | | | |separate storage managers are aggregated | | | | |together. | +----------------------+------------+-----------------------+---------------------------------------------+ |stman.odirect |boolean |false |Use ODirect o/s option, if supported, for the| | | | |**combined** mode of storage manager | | | | |operation (see above). Note, its use is | | | | |discouraged. | +----------------------+------------+-----------------------+---------------------------------------------+ |bufferMB |4000 | |Set the size of the memory buffer in MB used | | | | |for I/O. Up to twice this can be needed | | | | |depending on the tile shapes. | | | | |Setting this below 250 will makes mssplit run| | | | |very slow and setting it bigger than the | | | | |default has little benefit. | +----------------------+------------+-----------------------+---------------------------------------------+ Configuration Example --------------------- **Example 1** The following example demonstrates splitting out a single spectral channel, with no averaging: .. code-block:: bash # Input measurement set # Default: vis = full.ms # Output measurement set # Default: outputvis = chan1.ms # The channel range to split out into its own measurement set # Can be either a single integer (e.g. 1) or a range (e.g. 1-300). The range # is inclusive of both the start and end, indexing is one-based. # Default: channel = 1 # Defines the number of channel to average to form the one output channel # Default: 1 width = 1 **Example 2** The following example demonstrates both splitting and averaging. Here, the lowest numbered 54 channels are averaged together to form a single channel in the output measurement set. .. code-block:: bash # Input measurement set # Default: vis = full-18_5kHz.ms # Output measurement set # Default: outputvis = averaged_1MHz_chan_1.ms # The channel range to split out into its own measurement set # Can be either a single integer (e.g. 1) or a range (e.g. 1-300). The range # is inclusive of both the start and end, indexing is one-based. # Default: channel = 1-54 # Defines the number of channel to average to form the one output channel # Default: 1 width = 54 **Example 3** Finally, the following example demonstrates averaging a single measurement set with 16416 spectral channels by a factor of 54, creating a single output measurement set. i.e. 16416 x 18.5kHz channels to 304 x 1MHz channels. .. code-block:: bash # Input measurement set # Default: vis = full-18_5kHz.ms # Output measurement set # Default: outputvis = averaged_1MHz.ms # The channel range to split out into its own measurement set # Can be either a single integer (e.g. 1) or a range (e.g. 1-300). The range # is inclusive of both the start and end, indexing is one-based. # Default: channel = 1-16416 # Defines the number of channel to average to form the one output channel # Default: 1 width = 54