Cross-matchRun2XcosmoDC2

S.Plaszczynski 24 oct.19

Updates

  • increase nside=262144 (!): changes completeness
  • dx=cos(dec)*Delta(RA), dy=Delta(DEC)
  • flux cuts on golden sample

1.  selections

1.1  source = Run2.1i (dr1-b)

  • mag_i_cModel<25.3
  • SNR>1

: ~50M

1.2  target= CosmoDC2

  • mag_i<25.3 : ~80M (inc. ultrafaint)

2.  healpixel grid

 (nside, resol (arcsec))
 (1, 211076.28514206142),
 (2, 105538.14257103071),
 (4, 52769.071285515354),
 (8, 26384.535642757677),
 (16, 13192.267821378839),
 (32, 6596.133910689419),
 (64, 3298.0669553447096),
 (128, 1649.0334776723548),
 (256, 824.5167388361774),
 (512, 412.2583694180887),
 (1024, 206.12918470904435),
 (2048, 103.06459235452218),
 (4096, 51.53229617726109),
 (8192, 25.766148088630544),
 (16384, 12.883074044315272),
 (32768, 6.441537022157636),
 (65536, 3.220768511078818),
 (131072, 1.610384255539409),
 (262144, 0.8051921277697045),
 (524288, 0.40259606388485225)]

stats within 1 pixel:

  • run2:
+-----+--------+                                                                
|count|   count|
+-----+--------+
|    1|46150111|
|    2|  141961|
|    3|     103|
+-----+--------+
  • cosmoDC
+-----+--------+                                                                
|count|   freq|
+-----+--------+
|    1|76819578|
|    2| 1739137|
|    3|   33127|
|    4|     586|
|    5|      10|
|   58|       2|
+-----+--------+

3.  Full cosmoDC2xRun2 cross-match

basically a join based on ipix: but problems with pixels boundaries (some point may be close to a pixel border and should be associated actually to the pixel neighbour)

  1. Solution: duplicate each source on all neighboring pixels. Fast in NEST scheme but increases the source size by 8 (500M!)
  2. then join on ipix
  3. groupBy source_id (better "reduceBy") : choose an algorithm for reduction: first min(dist)

Spark @cori (10 nodes) interactively:

  1. reading + x8 duplicates ~ 30s
  2. join: ~80s
  3. reduceby~ 30s
TOTAL ~ 3 mins for a 500Mx80M cross-match !
+----+--------+--------------------+                                            
|nass|   count|                frac|
+----+--------+--------------------+
|   1|34891784|  0.7304920953658602|
|   2|10656958|  0.2231133718942536|
|   3| 1914284| 0.04007732394208735|
|   4|  265944|  0.0055677860957175|
|   5|   31903|6.679191100821053E-4|
|   6|    3462|7.248020434141769E-5|
|   7|     386|8.081270616922943E-6|
|   8|      38|7.955655011478544E-7|
|   9|       5|1.046796712036650...|
|  58|       2|4.187186848146602...|
+----+--------+--------------------+

change def: now matched means r<1 arcmin

cut (cumulative)source (M)matched (M)single-match (M)
i<25.352.8743.04 (81.4%)41.24 (95.8%)
clean+extendedness43.4737.20 (85.6%)35.44 (95.3%)
SNR>538.0234.39 (90.4%)32.65 (94.9%)
SNR>1022.7121.87 (96.3%)20.38 (93.2%)

4.  DC2 validation

4.1  sample purity

angles: One can study the astrometric errors by comparing the local x/y distributions (dx=cos(DEC) Delta RA and dy=Delta DEC)between the matched points. Here is the 2D histogram (log scaled):

  • One sees the healpixel shape and some (non isotropic) extra blurring due to the fact that we are adding neighbouring pixels
  • focusing on r<1 arcsec one sees a plateau above r>0.6
  • assuming a ~flat background (expected from random associations) the contamination is around 0.5% (r<0.6 arcsec)

what's the associated photometry?

With a cut on delta(flux):

  • blue: no cut
  • orange: loose |dflux_i| <500
  • green: tight -250< dflix_i< 200
  • not much difference (does not worth using a flux cut)
  • by eye (there are 1000 bins , vertical axis normalized to 1000), background < 1% (r<0.6)

4.2  completeness:

selectionsize (M)frac(%)
mag_i<25.3+clean+ext.38.8100
#ass=134.589
r<0.6 arcsec34.088
|dflux_i| <50028.473

4.3  Testing the PSF

according to Lupton the astrometric errors are related to the PSF ones (in the Gaussian case) by Var(x)=2 sigma^2/SN^2

so that we can test the PSF by looking at psf_x= dx*SNR/\sqrt{2}; psf_y= dy*SNR/\sqrt{2}

in radial coordinates:

  • <fwhm> ~ 0.45*2= 0.9 arcsec
  • ~Moffat with beta~2

<fwhm> ~30% larger that what appears in run2

another interesting way to look at this is to plot the distribution of r*SNR (r= distance between the 2 associated points). This is NOT the same than above because of the Jacobian in the transform, and should follow 2pi*r*Moffat(r)

The mode (max pos) is at ~0.40 For a Moffat distrib it lies at r_{max}=\frac{a}{\sqrt(2\beta-1}=0.44 fwhm (for beta=2): then we find again <fwhm>~0.9 arsec.

Then if we histogram psf_r/psf_fwhm_i it should peak at 0.44:

It is measured slightly higher (0.52) but given the fact that sqrt(2) is from a gaussain model and that the Moffat(beta=2) is not a very good approximation it looks quite fair.

4.4  Photometry

SNR

Magnitudes

Fluxes

( m=-2.5 log_{10}(f_\nu)+31.4 )

'>
  • Testing the errors
'>

4.5  stars

'>
'>

avec flux PSF:

'>

4.6  colors

'>

5.  Getting a probability for the match

  • Theorem: if x \sim f then CDF(x)\sim u[0,1] , CDF=Cumulative Distribution Function of f
  • from the golden sample we have distributions for distance/flux: construct binned CDFs ("cumsum")
  • for each sample p_1=CDF(distance), p2=CDF(flux) : each are u[0,1] for the signal and peaked to 0 for noise
  • combine both : P=p_1\cdot p_2 is reasonable. This is not (yet) a probability (ie u[0,1] distributed for signal).
  • compute (homework) CDF(P)= p_1 p_2(1- ln[p_1 p2])

This way we compute a probability (u[0,1] for the signal) for each candidate at take the max. You can later cut on that value to increase purity.

6.  Conclusions

  • cosmoDC2xRun2 match in 3min with Spark (with solution about pixel boundaries) : can provide ObjectId<->galaticId parquet file (does it interest someone?)
  • unbiased astrometry/fluxes distributions
  • astrometric error FWHM compatible with psfwhm/SNR at the 40% level. if considered as an equivalent gaussian sigma (=FWHM/2.355) wrong by a factor 3.3
  • not clear what cModelFluxErr_i really means (or how to cut on that), but nice unskewed distribution
  • match probability