Cross-matchRun2XcosmoDC2

S.Plaszczynski 24 oct.19

On this page... (hide)

1. selections
1. 1.1 source = Run2.1i (dr1-b)
2. 1.2 target= CosmoDC2
2. healpixel grid
3. Full cosmoDC2xRun2 cross-match
4. DC2 validation
1. 4.1 sample purity
2. 4.2 completeness:
3. 4.3 Testing the PSF
4. 4.4 Photometry
5. 4.5 stars
6. 4.6 colors
5. Getting a probability for the match
6. Conclusions

Updates

increase nside=262144 (!): changes completeness
dx=cos(dec)*Delta(RA), dy=Delta(DEC)
flux cuts on golden sample

1. selections

1.1 source = Run2.1i (dr1-b)

mag_i_cModel<25.3
SNR>1

: ~50M

1.2 target= CosmoDC2

mag_i<25.3 : ~80M (inc. ultrafaint)

2. healpixel grid

 (nside, resol (arcsec))
 (1, 211076.28514206142),
 (2, 105538.14257103071),
 (4, 52769.071285515354),
 (8, 26384.535642757677),
 (16, 13192.267821378839),
 (32, 6596.133910689419),
 (64, 3298.0669553447096),
 (128, 1649.0334776723548),
 (256, 824.5167388361774),
 (512, 412.2583694180887),
 (1024, 206.12918470904435),
 (2048, 103.06459235452218),
 (4096, 51.53229617726109),
 (8192, 25.766148088630544),
 (16384, 12.883074044315272),
 (32768, 6.441537022157636),
 (65536, 3.220768511078818),
 (131072, 1.610384255539409),
 (262144, 0.8051921277697045),
 (524288, 0.40259606388485225)]

stats within 1 pixel:

run2:

+-----+--------+                                                                
|count|   count|
+-----+--------+
|    1|46150111|
|    2|  141961|
|    3|     103|
+-----+--------+

cosmoDC

+-----+--------+                                                                
|count|   freq|
+-----+--------+
|    1|76819578|
|    2| 1739137|
|    3|   33127|
|    4|     586|
|    5|      10|
|   58|       2|
+-----+--------+

3. Full cosmoDC2xRun2 cross-match

basically a join based on ipix: but problems with pixels boundaries (some point may be close to a pixel border and should be associated actually to the pixel neighbour)

Solution: duplicate each source on all neighboring pixels. Fast in NEST scheme but increases the source size by 8 (500M!)
then join on ipix
groupBy source_id (better "reduceBy") : choose an algorithm for reduction: first min(dist)

Spark @cori (10 nodes) interactively:

reading + x8 duplicates ~ 30s
join: ~80s
reduceby~ 30s

TOTAL ~ 3 mins for a 500Mx80M cross-match !

+----+--------+--------------------+                                            
|nass|   count|                frac|
+----+--------+--------------------+
|   1|34891784|  0.7304920953658602|
|   2|10656958|  0.2231133718942536|
|   3| 1914284| 0.04007732394208735|
|   4|  265944|  0.0055677860957175|
|   5|   31903|6.679191100821053E-4|
|   6|    3462|7.248020434141769E-5|
|   7|     386|8.081270616922943E-6|
|   8|      38|7.955655011478544E-7|
|   9|       5|1.046796712036650...|
|  58|       2|4.187186848146602...|
+----+--------+--------------------+

change def: now matched means r<1 arcmin

cut (cumulative)	source (M)	matched (M)	single-match (M)
i<25.3	52.87	43.04 (81.4%)	41.24 (95.8%)
clean+extendedness	43.47	37.20 (85.6%)	35.44 (95.3%)
SNR>5	38.02	34.39 (90.4%)	32.65 (94.9%)
SNR>10	22.71	21.87 (96.3%)	20.38 (93.2%)

4. DC2 validation

4.1 sample purity

angles: One can study the astrometric errors by comparing the local x/y distributions (dx=cos(DEC) Delta RA and dy=Delta DEC)between the matched points. Here is the 2D histogram (log scaled):

One sees the healpixel shape and some (non isotropic) extra blurring due to the fact that we are adding neighbouring pixels
focusing on r<1 arcsec one sees a plateau above r>0.6
assuming a ~flat background (expected from random associations) the contamination is around 0.5% (r<0.6 arcsec)

what's the associated photometry?

With a cut on delta(flux):

blue: no cut
orange: loose |dflux_i| <500
green: tight -250< dflix_i< 200

not much difference (does not worth using a flux cut)
by eye (there are 1000 bins , vertical axis normalized to 1000), background < 1% (r<0.6)

4.2 completeness:

selection	size (M)	frac(%)
mag_i<25.3+clean+ext.	38.8	100
#ass=1	34.5	89
r<0.6 arcsec	34.0	88
\|dflux_i\| <500	28.4	73

4.3 Testing the PSF

according to Lupton the astrometric errors are related to the PSF ones (in the Gaussian case) by Var(x)=2 sigma^2/SN^2

so that we can test the PSF by looking at psf_x= dx*SNR/\sqrt{2}; psf_y= dy*SNR/\sqrt{2}

in radial coordinates:

<fwhm> ~ 0.45*2= 0.9 arcsec
~Moffat with beta~2

<fwhm> ~30% larger that what appears in run2

another interesting way to look at this is to plot the distribution of r*SNR (r= distance between the 2 associated points). This is NOT the same than above because of the Jacobian in the transform, and should follow 2pi*r*Moffat(r)

The mode (max pos) is at ~0.40 For a Moffat distrib it lies at r_{max}=\frac{a}{\sqrt(2\beta-1}=0.44 fwhm (for beta=2): then we find again <fwhm>~0.9 arsec.

Then if we histogram psf_r/psf_fwhm_i it should peak at 0.44:

It is measured slightly higher (0.52) but given the fact that sqrt(2) is from a gaussain model and that the Moffat(beta=2) is not a very good approximation it looks quite fair.

4.4 Photometry

SNR

Magnitudes

Fluxes

( m=-2.5 log_{10}(f_\nu)+31.4 )

Testing the errors

4.5 stars

avec flux PSF:

4.6 colors

5. Getting a probability for the match

Theorem: if x \sim f then CDF(x)\sim u[0,1] , CDF=Cumulative Distribution Function of f
from the golden sample we have distributions for distance/flux: construct binned CDFs ("cumsum")
for each sample p_1=CDF(distance), p2=CDF(flux) : each are u[0,1] for the signal and peaked to 0 for noise
combine both : P=p_1\cdot p_2 is reasonable. This is not (yet) a probability (ie u[0,1] distributed for signal).
compute (homework) CDF(P)= p_1 p_2(1- ln[p_1 p2])

This way we compute a probability (u[0,1] for the signal) for each candidate at take the max. You can later cut on that value to increase purity.

6. Conclusions

cosmoDC2xRun2 match in 3min with Spark (with solution about pixel boundaries) : can provide ObjectId<->galaticId parquet file (does it interest someone?)
unbiased astrometry/fluxes distributions
astrometric error FWHM compatible with psfwhm/SNR at the 40% level. if considered as an equivalent gaussian sigma (=FWHM/2.355) wrong by a factor 3.3
not clear what cModelFluxErr_i really means (or how to cut on that), but nice unskewed distribution
match probability

LSST@LAL