Selavy Source-Finding Tutorial ============================== This tutorial demonstrates the use of the Selavy sourcefinder to find sources in an image, and fit components to them. This will take the image produced from the :doc:`basiccontinuum` tutorial and demonstrate different approaches to building up a source catalogue. The various options of Selavy are described in detail in the :doc:`../analysis/index` section, and this tutorial will not attempt to demonstrate all possible modes. We will focus primarily on the needs of continuum imaging. Prerequisites ------------- This tutorial uses the outputs from the imaging tutorial, specifically the restored Taylor term images and, optionally, the weights images. However, the scripts can be used with essentially any image: Selavy is able to read both CASA and FITS format. To find the spectral index & curvature of sources, you will need Taylor term images - these are found automatically if the input image is named in the same way as the output from cimager, but you can directly specify them (see :doc:`../analysis/postprocessing` for details). A simple search --------------- The first demonstration will be a simple search to a given flux threshold, in this case 100mJy. Sources that lie above this threshold are "grown" to a secondary threshold of 50mJy, and then have Gaussian components fitted to them. Pixels that lie above the primary threshold are identified and grouped together into 'islands'. These are then grown out to include all neighbouring pixels above the secondary threshold - pixels with fluxes between these two thresholds are only included if they are attached to pixels above the primary threshold. When fitting is requested, Gaussian components are fitted to the islands and their parameters written to a separate catalogue. It is typically this catalogue that one would use in analysing continuum observations, or, for instance to construct a sky model. Here is the parset used for this search: .. code-block:: bash # The image to be searched Selavy.image = image.i.clean.sciencefield.linmos.taylor.0.restored # # We will search just a subsection of the image Selavy.flagSubsection = true # The subsection shows the pixel range for each axis, giving the # first & last pixel to be used (1-based). Selavy.subsection = [251:1800,251:1800,*,*] # # This is how we divide it up for distributed processing, with the # number of subdivisions in each direction, and the size of the # overlap region in pixels Selavy.nsubx = 6 Selavy.nsuby = 3 Selavy.overlapx = 50 Selavy.overlapy = 50 # # The search threshold, in the flux units of the image Selavy.threshold = 0.1 # Grow the detections to a secondary threshold Selavy.flagGrowth = true Selavy.growthThreshold = 0.05 # # Parameters to switch on and control the Gaussian fitting Selavy.Fitter.doFit = true Selavy.Fitter.fitTypes = [full] Selavy.Fitter.maxNumGauss = 2 # # Size criteria for the final list of detected islands Selavy.minPix = 3 Selavy.minVoxels = 3 Selavy.minChannels = 1 # # How the islands are sorted in the final catalogue - by # integrated flux in this case Selavy.sortingParam = iflux If one saves this to a parset file **selavy.in**, say, then it can be run using the following sbatch file: .. code-block:: bash #!/bin/bash -l #SBATCH --time=01:00:00 #SBATCH --ntasks=19 #SBATCH --ntasks-per-node=19 #SBATCH --job-name=selavy log=selavy-${SLURM_JOB_ID}.log aprun -N 19 -n 19 selavy -c selavy.in > $log This example demonstrates several elements of Selavy. First, it is run in distributed mode, where the image is split up so that each worker process only processes a small fraction of the image. The way it is split up is governed by the *nsubx* and *nsuby* parameters, being the number of subimages in the x- and y-directions, and the overlap between subimages (in pixels) is controlled by the *overlapx* and *overlapy* parameters. The overlap ensures we don't miss structure at the boundaries. The fitting is switched on by setting *Fitter.doFit = true* (it is off by default). There are many options possible in the fitting, which are detailed in :doc:`../analysis/postprocessing`. The setup in this example limits the fitting to either one or two Gaussian components - one is fit first and is accepted if the fit works and passes chi-squared & RMS tests, else a two component fit is tried. If both fail, the initial estimates are returned. Outputs ------- This first example used a relatively high threshold (it's actually about 70-sigma for the simulated image), and runs very quickly (ie. less that a second), finding 49 sources in the simulated image and fitting Gaussian components to them. The key output files are as follows: +--------------------------------+------------------------+--------------------------------------+ |Filename | Parameters | Description | +================================+========================+======================================+ |selavy-results.txt |resultsFile |The catalogue of detected islands and | | | |their parameters. | +--------------------------------+------------------------+--------------------------------------+ |selavy-results.xml |votFile |A VOTable version of the islands | | | |catalogue. | +--------------------------------+------------------------+--------------------------------------+ |selavy-results.ann |karmaFile / flagKarma |A karma annotation file showing the | | | |locations of the islands. | +--------------------------------+------------------------+--------------------------------------+ |selavy-results.reg |ds9File / flagDS9 |A DS9 region file showing the island | | | |locations | +--------------------------------+------------------------+--------------------------------------+ |selavy-results.crf |casaFile / flagCASA |A CASA region file showing the island | | | |locations | +--------------------------------+------------------------+--------------------------------------+ |selavy-fitResults.txt |fitResultsFile |The catalogue of fitted components. | | | | | +--------------------------------+------------------------+--------------------------------------+ |selavy-fitResults.xml | |A VOTable version of the fitted | | | |components catalogue. | +--------------------------------+------------------------+--------------------------------------+ |selavy-fitResults.{ann,reg,crf} |fitAnnotationFile |Annotation/region files showing the | | | |location of the fitted components. | +--------------------------------+------------------------+--------------------------------------+ |selavy-SubimageLocations.ann |subimageAnnotationFile |Karma annotation file showing the | | | |borders of the subimages. | +--------------------------------+------------------------+--------------------------------------+ There are two catalogues written: *selavy-results.txt* is the Duchamp-style catalogue showing the detected islands and their parameters; and *selavy-fitResults.txt* is the catalogue of fitted components. The ID of the fitted components has a number and one or more letters - the number refers to the ID of the island in *selavy-results.txt*, while the letters give the order of the component (ordered by flux) within that island. The first 26 will have a single letter a-z, then aa-zz, then aaa-zzz and so forth. The *.txt* files are simple formatted ASCII tables. They have XML/VOTable equivalents written as well - these are standard VOTable format that can be used in applications such as TOPCAT and Aladin. There are annotation and region files produced that allow the detected sources to be displayed in either kvis (the *.ann* files), DS9 (the *.reg* files) or casaviewer (either *.reg* or *.crf* are possible). These will show the outline of the detected island, or ellipses indicating the fitted components. There is also a karma annotation file showing the locations of the worker subimages. The catalogue of islands is the same as the results file produced by Duchamp, and has the following format. It starts with a listing of the parameters (this is what is produced by the Duchamp code, so Selavy-specific parameters are not typically reproduced here) and a summary of the thresholds and number of detections. All these lines are started with a **#** character. Then there is the actual catalogue, in ASCII format: .. code-block:: bash # Results of the Duchamp source finder v.1.6.1: Wed Jun 4 09:53:19 2014 # # ---- Parameters ---- # Image to be analysed.............................[imageFile] = image.i.clean.sciencefield.linmos.taylor.0.restored[251:1800,251:1800,1:1,1:1] # Intermediate Logfile...............................[logFile] = selavy-Logfile.Master.txt # Final Results file.................................[outFile] = selavy-results.txt # VOTable file.......................................[votFile] = selavy-results.xml # Karma annotation file............................[karmaFile] = selavy-results.ann # DS9 annotation file................................[ds9File] = selavy-results.reg # CASA annotation file..............................[casaFile] = selavy-results.crf # Saving mask cube?...........................[flagOutputMask] = false # Saving 0th moment to FITS file?........[flagOutputMomentMap] = false # Saving 0th moment mask to FITS file?..[flagOutputMomentMask] = false # Saving baseline values to FITS file?....[flagOutputBaseline] = false # ------ # Type of searching performed.....................[searchType] = spatial # Trimming Blank Pixels?............................[flagTrim] = false # Searching for Negative features?..............[flagNegative] = false # Area of Beam (pixels)....................................... = 29.2281 (beam: 5.86614 x 4.39728 pixels) # Removing baselines before search?.............[flagBaseline] = false # Smoothing data prior to searching?..............[flagSmooth] = false # Using A Trous reconstruction?...................[flagATrous] = false # Using Robust statistics?...................[flagRobustStats] = true # Using FDR analysis?................................[flagFDR] = false # Detection Threshold..............................[threshold] = 0.1 # Minimum # Pixels in a detection.....................[minPix] = 3 # Minimum # Channels in a detection..............[minChannels] = 1 # Minimum # Voxels in a detection..................[minVoxels] = 3 # Growing objects after detection?................[flagGrowth] = true # Threshold for growth.......................[growthThreshold] = 0.05 # Using Adjacent-pixel criterion?...............[flagAdjacent] = true # Max. velocity separation for merging........[threshVelocity] = 7 # Reject objects before merging?.......[flagRejectBeforeMerge] = false # Merge objects in two stages?...........[flagTwoStageMerging] = true # Method of spectral plotting.................[spectralMethod] = peak # Type of object centre used in results..........[pixelCentre] = centroid # -------------------- # # -------------------- # Summary of statistics: # Detection threshold = 0.1 Jy/beam # Detections grown down to threshold of 0.05 Jy/beam # # Not calculating full stats since threshold was provided directly. # -------------------- # Total number of detections = 49 # -------------------- # --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- # ObjID Name X Y Z RA DEC RA DEC FREQ MAJ MIN PA w_RA w_DEC w_50 w_20 w_FREQ F_int F_tot F_peak X1 X2 Y1 Y2 Z1 Z2 Nvoxel Nchan Nspatpix Flag X_av Y_av Z_av X_cent Y_cent Z_cent X_peak Y_peak Z_peak # [deg] [deg] [Hz] [arcmin] [arcmin] [deg] [arcmin] [arcmin] [Hz] [Hz] [Hz] [Jy] [Jy/beam] [Jy/beam] # --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 J121503-455738 1957.9 656.6 0.0 12:15:03.8 -45:57:38 183.765738 -45.960668 900000000.000 1.01 0.84 11.78 1.42 1.73 0.000 0.000 0.000 0.332 9.714 0.277 1954 1962 652 661 0 0 73 1 73 - 1957.8 656.6 0.0 1957.9 656.6 0.0 1958 657 0 2 J121529-464245 1919.0 387.5 0.0 12:15:29.2 -46:42:45 183.871519 -46.712521 900000000.000 0.99 0.82 175.89 0.94 1.38 0.000 0.000 0.000 0.106 3.102 0.129 1917 1922 384 391 0 0 37 1 37 - 1919.1 387.5 0.0 1919.0 387.5 0.0 1919 388 0 The components catalogue *selavy-fitResults.txt*, looks like this: .. code-block:: bash # ID Name RA DEC X Y F_int F_peak F_int(fit) F_pk(fit) Maj(fit) Min(fit) P.A.(fit) Maj(fit_deconv.) Min(fit_deconv.) P.A.(fit_deconv.) Alpha Beta Chisq(fit) RMS(image) RMS(fit) Nfree(fit) NDoF(fit) NPix(fit) NPix(obj) Guess? # -- -- [deg] [deg] [pix] [pix] [Jy] [Jy/beam] [Jy] [Jy/beam] [arcsec] [arcsec] [deg] [arcsec] [arcsec] [deg] -- -- -- [Jy/beam] [Jy/beam] -- -- -- -- -- 1a J121503-455738 183.765701 -45.960550 1957.9 656.7 0.332 0.277 0.407 0.275 67.246 56.746 22.81 48.391 5.05654 344.13 -6.721 0.000 30.265 0.00664 0.644 6 66 73 73 0 2a J121529-464245 183.871722 -46.712437 1919.0 387.6 0.106 0.129 0.175 0.129 68.654 50.859 169.99 41.442 14.41590 321.44 -4.951 0.000 1.704 0.00516 0.215 6 30 37 37 0 Note the simpler structure with just the column names and units at the top. This file should be able to be easily read in python using the astropy.table module. Here is some example code that allows you to interact with it: .. code-block:: python #!/usr/bin/env python import astropy.table as table # read the catalogue cat = table.Table.read('selavy-fitResults.txt',format='ascii') # show the column names cat.colnames # access the integrated fluxes of the components directly, storing as a numpy array fint = cat.columns['F_int(fit)'].data Simple Signal-to-noise thresholding ----------------------------------- A simple change to the above parset can make your search be in signal-to-noise terms. Change .. code-block:: bash # The search thresholds, in the flux units of the image Selavy.threshold = 0.1 Selavy.flagGrowth = true Selavy.growthThreshold = 0.05 to .. code-block:: bash # The search thresholds, in units of sigma above the mean Selavy.snrCut = 10 Selavy.flagGrowth = true Selavy.growthCut = 4 and you will do a search for sources above 10-sigma, that are then grown to the 4-sigma level. The default approach uses the median and median absolute deviation from the median (MADFM), although the mean and standard deviation can be used by setting *Selavy.flagRobustStats=false*. The robust statistics take a little longer, due to the partial sorting that is involved, but they are probably more reliable, particularly when there is a lot of signal present. This approach uses the Duchamp method of applying a single threshold to the entire image. When run in distributed mode, as this will be, each worker finds the mean (or median), sends it to the master process which averages them to find the global mean estimator. This is then distributed to the workers who then find their local standard deviation or MADFM. These are again averaged by the master to provide a global estimator of the standard deviation, and hence the threshold. A more complete description of this process can be found in `Whiting & Humphreys (2012), PASA 29, 371`_. .. _Whiting & Humphreys (2012), PASA 29, 371: http://www.publish.csiro.au/paper/AS12028.htm This will take slightly longer than the first example, due to the additional statistics calculation & communication, and because it finds more sources due to the lower threshold. It will still be quick - my example took about 20 seconds to run, finding 560 sources. Variable signal-to-noise thresholding ------------------------------------- Instead of only using a single global threshold, Selavy allows the use of signal-to-noise thresholds that depend only on the noise local to a given pixel. This allows the actual flux threshold to vary according to the image noise, rising where the sensitivity or background due to sidelobes is high, and falling where the sensitivity is very good. To make use of this, keep the signal-to-noise threshold parameters as shown above, and add the following: .. code-block:: bash # Turn on the variable threshold option Selavy.VariableThreshold = true # Pixel width of box in which to calculate statistics. Selavy.VariableThreshold.boxSize = 50 # Name of image of detection threshold to be created Selavy.VariableThreshold.ThresholdImageName=detThresh.img # Name of image of noise to be created Selavy.VariableThreshold.NoiseImageName=noiseMap.img # Name of image of mean pixel level to be created Selavy.VariableThreshold.AverageImageName=meanMap.img # Name of image of signal-to-noise ratio to be created Selavy.VariableThreshold.SNRimageName=snrMap.img The key parameters are *VariableThreshold* and *VariableThreshold.boxSize*, which turn on the feature and set the size of the box in which the statistics and threshold are calculated. A square box of this width is slid over the image, and each pixel has a threshold calculated from the noise statistics of the pixels within the box centred on it. Images showing the various statistics can be written - only those given in the parset will be written. All images will be the same size as the input image (regardless of what search subsection you have used). This method also allows you to more confidently drop the detection threshold from those levels used with the global threshold case above, as you won't be swamped by spurious signal in high-noise areas. Again, this works with the distributed processing. The overlaps between subimages will be at least twice the box width, as pixels within a box width of the edge of a worker's image will not be processed (note this if you have things near the edge of your image that you care about). The execution time for this mode is a bit longer, as the sliding-box calculations require quite a bit more processing (particularly if the robust statistics are used). Even so, my tests completed in about 1.5 minutes, finding about 800 sources. Alternative modes ----------------- This section will detail a few alternative ways of running Selavy Finding initial component estimates ................................... A key part of the Gaussian fitting is determining an initial estimate of the parameters. Selavy's original approach (and currently the default) was to apply a number of "sub-thresholds" in between the detection threshold and the peak, and track the number of separate maxima within the island. There are a range of parameters, detailed on :doc:`../analysis/postprocessing`, that control the details of this algorithm. While this approach works well, I've recently incorporated the algorithm used in Aegean, which uses a curvature map to locate local maxima. See `Hancock et al. (2012), MNRAS 422, 1812`_ for details on the algorithm. I'm currently doing a detailed comparison of the two approaches, but this mode has been made available. .. _Hancock et al. (2012), MNRAS 422, 1812: http://adsabs.harvard.edu/abs/2012MNRAS.422.1812H To make use of this option, add the following to your parset: .. code-block:: bash # Parameters to switch on curvature map analysis and output the # curvature image Selavy.Fitter.useCurvature = true Selavy.Fitter.curvatureImage = curvature.img Forcing a PSF fit ................. It may be that you want to force the Gaussian fitting to fit only PSF-sized components, rather than allowing the size of the components to expand. To do this, you can choose an alternative mode for the fitting by putting this in your parset: .. code-block:: bash # This forces the fits to just use PSF-sized components Selavy.Fitter.fitTypes = [psf] This will then just fit to the location and flux of the component, keeping the size fixed to that of the beam (as specified in the image header). You can specify more than one fitting mode, by doing something like: .. code-block:: bash # This does both a full-parameter fit and a PSF-sized fit Selavy.Fitter.fitTypes = [full,psf] This way, point sources will typically be fit by a PSF-sized component, but extended sources will still be fit by a larger Gaussian. The best fit of the two is chosen on the basis of the reduced chi-squared value from the fit. [Note the discussion in :doc:`../analysis/postprocessing` about the suitability of the chi-squared value in the case of correlated pixels.] Dealing with bright sidelobes ............................. It may be that you want to extract a component fitting a source with bright sidelobes, for instance from a poorly-cleaned BETA image. Here’s a good procedure for Selavy: .. code-block:: bash # Fitting parameters, as before Selavy.Fitter.doFit = true Selavy.Fitter.fitTypes = [full] # Limit the number of Gaussians to 1 Selavy.Fitter.maxNumGauss = 1 # Do not use the number of initial estimates to determine how many # Gaussians to fit Selavy.Fitter.numGaussFromGuess = false # The fit may be a bit poor, so increase the reduced-chisq threshold Selavy.Fitter.maxReducedChisq = 10. # # Allow islands that are slightly separated to be considered a # single 'source' Selavy.flagAdjacent = false # The separation in pixels for islands to be considered 'joined' Selavy.threshSpatial = 7 This forces a fit of a single Gaussian only (ignoring the full set of initial estimates), and (as of change set 6455, or release CP-0.3) returns the first estimate if the fit fails. Note that the max reduced chisq has been upped to 10 to capture poor fits (often a good idea for bright targets, as the PSF is slightly different to a Gaussian). Here’s an example, where all the red contours are a single Duchamp source and the blue ellipse represents the fitted component. Some tweaking of the *threshSpatial* parameter might be needed, depending on the typical separation of the sidelobes .. image:: figures/sidelobeSource.png :align: center