Galaxy cluster weak gravitational lensing analysis¶
For the Rubin Science Platform at data.lsst.cloud.
Data Release: DP02
Container Size: medium
LSST Science Pipelines version: Weekly 2025_09
Last verified to run: 2025-03-15
Contact author(s): Shenming Fu
Description: This notebook demonstrates how to query Data Preview 0 (DP0) data for galaxy cluster weak gravitational lensing analysis.
Skills: Select background galaxies by color and make a mean shape profile.
LSST Data Products: Images (deepCoadd
) and catalogs (Object
table).
Packages: lsst.daf.butler, lsst.rsp
Credit: This tutorial was originally developed by Shenming Fu, reviewed by Andrés A. Plazas Malagón, and in collaboration with the Rubin Community Science Team.
Get Support: Find DP0-related documentation and resources at dp0.lsst.io. Questions are welcome as new topics in the Support - Data Preview 0 Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.
1. Introduction¶
Galaxy clusters are the largest gravitionally bound objects in the Universe. According to the General Relatively, massive objects warp the spacetime, and light follows the curvature of spacetime. Thus, massive objects act like lenses that bend the path of the light emitted from distance sources. This effect is called "gravitional lensing".
The large mass of a galaxy cluster coherently distorts the images of backgroud galaxies, and this distortion occurs over a large sky area around the cluster. The lensing effect on the shape of a single background galaxy far from the cluster center is small, but the lensing signal can be detected by averaging the shape of many background galaxies. This is because the original shapes of the galaxies are randomly oriented. This effect is called weak gravitational lensing (WL), which shows up in the statistics of a large sample of galaxy shapes. More detailed introduction to cluster WL can be found in review papers (e.g., Bartelmann & Schneider 2001; Umetsu 2020).
Note that WL also happens between galaxies and when the light passes through the large-scale structure of the Universe (cosmic shear). Compared to galaxy-galaxy lensing and cosmic shear, the cluster WL signal is generally about 1 magnitude higher (at the level of 10 times).
This notebook gives an example of detecting lensing signal of clusters using DP0 data.
1.1. Import packages¶
Import general packages, the LSST Science Pipelines packages for bulter and display, and Rubin Science Platform Table Access Protocol (TAP) package.
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import binned_statistic, bootstrap
from astropy.table import Table, vstack
from astropy.coordinates import SkyCoord
# LSST Science Pipelines (Stack) packages
import lsst.daf.butler as dafButler
import lsst.afw.display as afwDisplay
from lsst.rsp import get_tap_service
plt.style.use('tableau-colorblind10')
afwDisplay.setDefaultBackend('matplotlib')
1.2. Define functions¶
Define a function to compute resampling.
def get_stat(data, statistic=np.mean, confidence_level=0.682):
"""
Given the data, statistic, confidence_level,
compute the value and scatter of the statistic using bootstrapping.
Parameters
----------
data: float
Numpy array.
statistic: float
Statistic such as np.mean, np.median.
confidence_level: float
Size of confidence interval.
Returns
-------
stat: float
Statistic value.
lerr: float
Lower error.
herr: float
Higher error.
"""
rng = np.random.default_rng()
res = bootstrap((data,),
statistic=statistic,
confidence_level=confidence_level,
random_state=rng)
stat = np.mean(data)
lerr = stat - res.confidence_interval[0]
herr = res.confidence_interval[1] - stat
return stat, lerr, herr
2. Get the coadd object catalog around a cluster¶
There is a known cluster at (55.749deg, -32.273deg) for Right Ascension (RA), DEClination (DEC). Make a cone search for primary objects with radius 15 arcmin using Table Access Protocol (TAP). More details about the TAP service can be found in the tutorial Notebook for TAP (DP02_02a Introduction to TAP).
Select galaxies by extendedness, which is the difference between the Point Spread Function (PSF) photometry and the Composite-Model (CModel) photometry (Abazajian et al. 2004, Bosch et al. 2018). CModel measures extended objects and is used for galaxy photometry generally.
Use the g
and r
band color information to remove foreground galaxies, applying relevant flag cuts. Additionally, use the r
band for shape analysis. Using the r
band shapes is common in weak lensing studies. The main reason is the balance between seeing and sky brightness for shape measurement (Fu et al. 2022). Another reason is that Differential Chromatic Refraction (DCR) is weaker in redder bands, which reduces the PSF elongation along the zenith and makes the galaxy's pre-PSF shape easier to measure (Plazas et al. 2012).
Also, apply quality cuts to the shape measurements of galaxies for WL analysis (Mandelbaumet al. 2018).
For simplicity, skip the use of photometric redshift (photo-z) in this notebook. While the photo-z information can help select background galaxies by redshift, background galaxies can also be selected directly by color.
service = get_tap_service("tap")
assert service is not None
center_ra = 55.749
center_dec = -32.273
radius = 15/60.
str_center_coords = str(center_ra) + ", " + str(center_dec)
str_radius = str(radius)
query = "SELECT r_ra, r_decl, "\
"r_cModelFlux, r_cModelFluxErr, g_cModelFlux, g_cModelFluxErr, "\
"r_hsmShapeRegauss_e1, r_hsmShapeRegauss_e2, x, y "\
"FROM dp02_dc2_catalogs.Object "\
"WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), "\
"CIRCLE('ICRS', " + str_center_coords + ", " + str_radius + ")) = 1 "\
"AND detect_isPrimary = 1 "\
"AND r_extendedness = 1 "\
"AND r_extendedness_flag = 0 "\
"AND r_cModel_flag = 0 "\
"AND g_cModel_flag = 0 "\
"AND r_psfFlux_flag = 0 "\
"AND r_centroid_flag = 0 "\
"AND r_hsmShapeRegauss_sigma < 0.4 "\
"AND r_blendedness < 0.42 "
print(query)
SELECT r_ra, r_decl, r_cModelFlux, r_cModelFluxErr, g_cModelFlux, g_cModelFluxErr, r_hsmShapeRegauss_e1, r_hsmShapeRegauss_e2, x, y FROM dp02_dc2_catalogs.Object WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec), CIRCLE('ICRS', 55.749, -32.273, 0.25)) = 1 AND detect_isPrimary = 1 AND r_extendedness = 1 AND r_extendedness_flag = 0 AND r_cModel_flag = 0 AND g_cModel_flag = 0 AND r_psfFlux_flag = 0 AND r_centroid_flag = 0 AND r_hsmShapeRegauss_sigma < 0.4 AND r_blendedness < 0.42
Run the job.
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
assert job.phase == 'COMPLETED'
Job phase is COMPLETED
Finally, save the results into an astropy table.
data = job.fetch_result().to_table()
job.delete()
Delete the variable.
del query
Apply some further cuts on the g
and r
band photometry to ensure the quality. First require the flux error to be positive to remove unphysical measurements.
data = data[data['r_cModelFluxErr'] > 0]
data = data[data['g_cModelFluxErr'] > 0]
Make selection based on signal-to-noise ratio (SNR). Make a typical 5-sigma cut on SNR to select well-detected objects.
r_SNR = data['r_cModelFlux'] / data['r_cModelFluxErr']
g_SNR = data['g_cModelFlux'] / data['g_cModelFluxErr']
sel = r_SNR > 5.
sel &= g_SNR > 5.
WL distorts galaxy shapes, which are described by ellipticity. The ellipticity is constructed from the second moments of the object's 2D flux distribution, corrected for PSF effects. The HSM shape measurement is available in the catalog. The details of HSM are described in the work of Hirata and Seljak (2003) and Mandelbaum et al. (2005).
Require the measured ellipticity to be within 2. This removes galaxies with very large measured ellipticities that are unphysical, but allows galaxies with measured ellipticities slightly larger than 1 caused by noise (Mandelbaum et al. 2018).
For HSM shapes, the mean ellipticity divided by 2 approximates the lensing shear. Between the measured shear and the true shear there is a small bias, which is caused by galaxy shape dispersion, measurement noise, pixelization, and other effects. Usually, a shear calibration corrects for this bias. However, accurately determining shear calibration parameters requires further image simulation, which is beyond the scope of this notebook. Thus, for demonstration, skip shear calibration but focus on studying the mean galaxy shape in this notebook.
e1 = data['r_hsmShapeRegauss_e1']
e2 = data['r_hsmShapeRegauss_e2']
sel &= (e1**2 + e2**2)**0.5 < 2.
print("Fraction of large ellipticity: ", np.sum(~sel)/len(data))
Fraction of large ellipticity: 0.04576001108493834
Make the selection.
data_s = data[sel]
3. Select galaxies by color¶
Create a g-r
vs r
color-magnitude diagram (CMD) of galaxies. Clusters generally show a red sequence (RS) in the CMD due to evolution -- those red galaxies are the oldest and reddest in the cluster (Kodama and Arimoto 1997; Gladders and Yee 2000). Galaxies that are redder than the RS are background galaxies. Thus, use colors to select a sample of background galaxies, and a sample of bright cluster member galaxies -- they show different colors in the CMD. Here is also an example tutorial of using the Rubin Science Platform (RSP) Portal to study the RS (Exploring Extended Object Populations with Histograms).
First, convert flux into magnitude and compute the magnitude uncertainty, based on the tutorial notebook DP02_01 (Introduction to Jupyter Notebooks for Data Preview 0.2). Note, the derivation of the magnitude uncertainty comes from the difference of the magnitude equation. Because $m = -2.5 * log(F) + m0$, where $m0$ is the magnitude zero, $m$ is the magnitude, and $F$ is the flux. Then $dm = -2.5/ln(10) * dF / F$, and the SNR is $F / dF$.
r_cModel = -2.5*np.log10(data_s['r_cModelFlux']) + 31.4
g_cModel = -2.5*np.log10(data_s['g_cModelFlux']) + 31.4
fac = 2.5 / np.log(10)
r_cModel_err = fac / r_SNR[sel]
Make the CMD.
There is a concentration of galaxies at $g-r\approx1.1$, when the galaxies have different brightness (r
band magnitude). That concentration represents the RS. Select galaxies redder than that to get background galaxies, and allow some gap from the the RS because of magnitude measurement uncertainties.
plt.figure()
plt.scatter(r_cModel,
g_cModel - r_cModel,
s=0.03, c='k', marker='.')
sel_rs = g_cModel - r_cModel > 1.0
sel_rs &= g_cModel - r_cModel < 1.2
sel_rs &= r_cModel < 23.
plt.scatter(r_cModel[sel_rs],
(g_cModel - r_cModel)[sel_rs],
s=10, c='r', alpha=0.3,
label='red sequence', marker='^')
sel_bg = g_cModel - r_cModel > 1.35
plt.scatter(r_cModel[sel_bg],
(g_cModel - r_cModel)[sel_bg],
s=5, c='b', alpha=0.3,
label='background', marker='s')
_ = plt.legend()
plt.xlim(15, 27)
plt.ylim(-1, 3.)
plt.xlabel('r [mag]')
plt.ylabel('g - r [mag]')
plt.minorticks_on()
Figure 1: Color-magnitude of galaxies with markers showing the selected background galaxies and the bright red sequence galaxies.
Check the relation between the r
band magnitude and uncertainty, and compare that with SNR. In the figure, the uncertainty grows as the magnitude increases (i.e., the objects are fainter), which is expected.
plt.figure()
plt.scatter(r_cModel,
r_cModel_err,
s=0.1, c='k')
ls_list = ['-', '--', '-.']
for i, SNR in enumerate([5, 10, 20]):
plt.axhline(fac / SNR, c='C%d'%i, label='SNR=%d'%SNR, ls=ls_list[i])
plt.xlim(16, 28)
plt.ylim(0, 0.25)
plt.axvline(26, c='y', ls=':', label='mag=26')
plt.xlabel('r [mag]')
plt.ylabel('r_err [mag]')
_ = plt.legend()
Figure 2: r-band magnitude and uncertainty of galaxies.
Now, check the distribution of galaxies on the sky. Clearly, the RS galaxies have a concentration near the cluster center, but the background ones are uniformally distributed.
ra = data_s['r_ra']
dec = data_s['r_decl']
plt.figure(figsize=(5, 5))
plt.scatter(ra[sel_rs], dec[sel_rs], s=10, c='r',
label="red sequence", marker='^', alpha=0.5)
plt.scatter(ra[sel_bg], dec[sel_bg], s=5, c='b',
label="background", marker='s', alpha=0.5)
plt.xlabel('RA [deg]')
plt.ylabel('DEC [deg]')
plt.gca().invert_xaxis()
_ = plt.legend()
Figure 3: Sky distribution of the selected background and red sequence galaxies.
Also check the coadd image and see how the RS galaxies look like -- those bright large cluster galaxies do get selected. The image information comes from the tutorial notebook DP02_04a (Introduction to the Butler).
config = 'dp02'
collections = '2.2i/runs/DP0.2'
butler = dafButler.Butler(config, collections=collections)
datasetType = 'deepCoadd'
dataId = {'tract': 4431, 'patch': 17, 'band': 'r'}
coadd = butler.get(datasetType, dataId=dataId)
x = data_s['x']
y = data_s['y']
fig = plt.figure()
display = afwDisplay.Display(frame=fig)
display.scale('asinh', 'zscale')
display.mtv(coadd.image)
plt.scatter(x[sel_rs], y[sel_rs], ec='r', fc='none')
plt.xlim(12000, 16000)
plt.ylim(8000, 12000)
plt.xlabel('x [pix]')
plt.ylabel('y [pix]')
Text(69.30722222222222, 0.5, 'y [pix]')
Figure 4: Coadd r-band image with the selected RS galaxies.
del coadd
4. Lensing analysis¶
Compute the tangential and cross ellipticities of background galaxies with respect to the cluster center.
In the following formulae, $e_\textrm{T}$ is the tangential ellipticity, $e_\textrm{X}$ is the cross ellipticity, $\varphi$ is the position angle towards the cluster center (top North and left East), $e_1$ and $e_2$ are the original ellipticity components. Note, in the formulae the position angle starts from the negative RA direction (West) and increases counterclockwise, but in Astropy the position angle starts from North. Thus, add $\pi/2$ to the position angle computed by Astropy.
$e_\textrm{T}= - e_1 \cos(2 \varphi)- e_2 \sin(2 \varphi)$
$e_\textrm{X}= e_1 \sin(2 \varphi) - e_2 \cos(2 \varphi)$
Here, the tangential direction is perpendicular to the line connecting the galaxy and the cluster center, while the cross direction is 45 degrees counterclockwise from the tangential direction.
Also, compute the radial distance in arcminutes.
e1 = data_s['r_hsmShapeRegauss_e1']
e2 = data_s['r_hsmShapeRegauss_e2']
coord0 = SkyCoord(center_ra, center_dec, frame='icrs', unit='deg')
coord1 = SkyCoord(ra, dec, frame='icrs', unit='deg')
position_angle = coord0.position_angle(coord1).rad + np.pi/2.
e_t = - e1 * np.cos(2.*position_angle) - e2 * np.sin(2.*position_angle)
e_x = + e1 * np.sin(2.*position_angle) - e2 * np.cos(2.*position_angle)
r = coord0.separation(coord1).arcmin
Examine the distributions of the tangential and cross ellipticities. They both show Gaussian-like distributions, but the tangential ellipticity distribution has a mean slightly above zero, while the cross ellipticity distribution has a mean close to zero (Kaiser 1995; Umetsu 2020).
bins = np.linspace(-1, 1, 30)
mid = 0.5 * (bins[1:] + bins[:-1])
plt.figure()
n_e_t, _, im = plt.hist(e_t[sel_bg], bins=bins, histtype='step',
density=True, label='e_t')
n_e_x, _, im = plt.hist(e_x[sel_bg], bins=bins, histtype='step',
density=True, ls='--', label='e_x')
plt.axvline(np.mean(e_t[sel_bg]), c='C0', label='mean[e_t]')
plt.axvline(np.mean(e_x[sel_bg]), c='C1', ls='--', label='mean[e_x]')
plt.xlabel('ellipticity')
plt.ylabel('count')
_ = plt.legend()
Figure 5: Distributions of the tangential ellipticities and the cross ellipticities of the selected background galaxies.
The difference between two distributions shows a dipole feature as expected, because this is the difference between two Gaussian-like functions that only have a small offset of the peak. See also an example in the work of Dell'Antonio et al. (2020).
plt.figure()
plt.step(mid, n_e_t - n_e_x, where='mid')
plt.axvline(0, c='k', ls=':')
plt.xlabel('ellipticity')
plt.ylabel('count difference')
Text(0, 0.5, 'count difference')
Figure 6: Difference between the distributions of the tangential ellipticities and the cross ellipticities.
Bin the data by radial distance to get the mean tangential and cross ellipticities. Note, these are ellipticity profiles rather than shear profiles.
Estimate the error bar by the standard deviation divided by the square root of the number of data points ($std/\sqrt{N}$). Here, the shape noise (which can be as large as the ellipticity itself) is usually about 10 times bigger than other sources of error (e.g., measurement error, large-scale structure). Therefore, using the statistical error is sufficient in this case, assuming each galaxy's shape is an independent measurement. See also the consistency with the resampling result below.
When plotting the cross component result, shift the data point slightly for better visualization.
This cluster has sufficient mass and an appropriate redshift to produce a significant lensing signal (mean tangential ellipticity). Additionally, the mean cross ellipticity is consistent with zero, indicating no significant systematics.
r_max = np.max(r)
print(r_max)
bins = np.linspace(0, r_max, 7)
mid = 0.5 * (bins[1:] + bins[:-1])
e_t_mean, _, _ = binned_statistic(r[sel_bg], e_t[sel_bg], bins=bins)
e_x_mean, _, _ = binned_statistic(r[sel_bg], e_x[sel_bg], bins=bins)
e_t_std, _, _ = binned_statistic(r[sel_bg], e_t[sel_bg], bins=bins,
statistic='std')
e_x_std, _, _ = binned_statistic(r[sel_bg], e_x[sel_bg], bins=bins,
statistic='std')
count, _, _ = binned_statistic(r[sel_bg], e_t[sel_bg], bins=bins,
statistic='count')
e_t_err = e_t_std / count ** 0.5
e_x_err = e_x_std / count ** 0.5
plt.figure()
plt.errorbar(mid, e_t_mean, e_t_err, fmt='o', label='e_t')
plt.errorbar(mid+r_max*0.03, e_x_mean, e_x_err, fmt='x', label='e_x')
plt.axhline(0, c='k', ls=':')
plt.xlabel('r [arcmin]')
plt.ylabel('<e_i>')
_ = plt.legend()
14.999815490020579
Figure 7: Mean shape profile with error bars estimated by statistical uncertainty.
As mentioned earlier, there are other sources of error in addition to shape noise. For example, the background large-scale structure can introduce correlated noise between radial bins. To fully catch all kinds of error, consider using resampling to estimate the intrinsic scatter of the measurement.
Thus, also use bootstrapping to estimate the error bars. Bootstrapping is a resampling method, which randomly picks sources in the bin with replacement to create a new sample that has the same size as the original. The mean ellipticity of the new sample is recorded, and this process is repeated many times. The scatter of these resampled means is then used as the error bar.
The result is very similar to the previous one, because other types of error are much smaller than the shape noise.
inds = np.digitize(r[sel_bg], bins)
e_t_mean_list = []
e_t_lerr_list = []
e_t_herr_list = []
e_x_mean_list = []
e_x_lerr_list = []
e_x_herr_list = []
for i in range(1, len(bins)):
sel = inds == i
mean, lerr, herr = get_stat(np.ma.getdata(e_t[sel_bg][sel]))
e_t_mean_list.append(mean)
e_t_lerr_list.append(lerr)
e_t_herr_list.append(herr)
mean, lerr, herr = get_stat(np.ma.getdata(e_x[sel_bg][sel]))
e_x_mean_list.append(mean)
e_x_lerr_list.append(lerr)
e_x_herr_list.append(herr)
plt.figure()
plt.errorbar(mid, e_t_mean_list, [e_t_lerr_list, e_t_herr_list],
fmt='o', label='e_t')
plt.errorbar(mid+r_max*0.03, e_x_mean_list, [e_x_lerr_list, e_x_herr_list],
fmt='x', label='e_x')
plt.axhline(0, c='k', ls=':')
plt.xlabel('r [arcmin]')
plt.ylabel('<e_i>')
_ = plt.legend()
Figure 8: Mean shape profile with error bars estimated by bootstrapping.
5. Exercises for the learner¶
- Use a large cutout (see the notebook DP02_03a, Image Display and Manipulation) to obtain a big coadd image, instead of using the Butler in this case.
- Similar to the analysis of background sources here, test the shape distribution of cluster member galaxies, using the RS selection described above. Both the mean tangential and cross ellipticities are expected to be zero, because there is no lensing effect on cluster galaxies. The member galaxies may exist a small alightment due to gravity (instrinsic alignment), but this effect is much smaller than lensing.
- Test other clusters in the DP02 data.
Note: It is encouraged that researchers join the LSST Dark Energy Science Collaboration (DESC) to carry out cosmological studies using the Rubin data.