Comparing Object and Truth TablesΒΆ
Contact author: Jeff Carlin
Last verified to run: 2024-08-19
LSST Science Pipelines version: Weekly 2024_32
Container size: medium
Targeted learning level: beginner
Description: An introduction to using the truth data for the Dark Energy Science Collaboration's DC2 data set, which formed the basis for the DP0 data products.
Skills: Use the TAP service with table joins to retrieve truth data matched to the Object catalog.
LSST Data Products: TAP dp02_dc2_catalogs.Object, .MatchesTruth, and .TruthSummary tables.
Packages: lsst.rsp.get_tap_service, lsst.rsp.retrieve_query
Credit: Originally developed by Jeff Carlin and the Rubin Community Science Team in the context of the Rubin DP0, with some help from Melissa Graham in the update from DP0.1 to DP0.2.
Get Support: Find DP0-related documentation and resources at dp0.lsst.io. Questions are welcome as new topics in the Support - Data Preview 0 Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.
1.0. IntroductionΒΆ
This tutorial demonstrates how to use the TAP service to query and retrieve data from the two truth tables and the Object table for DP0.2. Joining these three tables enables users to compare the recovered (measured) properties (e.g., fluxes, positions, magnitudes, etc. from the Object table, which is comprised of SNR>5 detections in the deepCoadds) to the simulated values that were assigned to each object when creating the DC2 simulations.
Thanks to the DESC! The DC2 simulations which make up DP0 were generated by the Dark Energy Science Collaboration for their Data Challenge 2 (DC2). A full description of the simulated truth data can be found in the ApJS paper The LSST DESC DC2 Simulated Sky Survey, with more information in the DESC's DC2 Data Release Note.
1.1. Package importsΒΆ
The matplotlib
, numpy
, pandas
, and astropy
libraries are widely used Python libraries for plotting, scientific computing, and astronomical data analysis. We will use these packages below, including the matplotlib.pyplot
plotting sublibrary.
We also use the warnings
package to minimize standard output within the notebook, and the lsst.rsp
package to access the TAP service and query the DP0 catalogs.
import matplotlib.pyplot as plt
import numpy as np
import pandas
from lsst.rsp import get_tap_service, retrieve_query
1.2. Define functions and parametersΒΆ
Set the pandas parameter for the maximum number of rows to display to 200.
pandas.set_option('display.max_rows', 200)
Set matplotlib
to show plots inline, within the notebook.
%matplotlib inline
Set up colors and plot symbols corresponding to the ugrizy bands. These colors are the same as those used for ugrizy bands in Dark Energy Survey (DES) publications, and are defined in this github repository.
plot_filter_labels = ['u', 'g', 'r', 'i', 'z', 'y']
plot_filter_colors = {'u': '#0c71ff', 'g': '#49be61', 'r': '#c61c00',
'i': '#ffc200', 'z': '#f341a2', 'y': '#5d0000'}
plot_filter_symbols = {'u': 'o', 'g': '^', 'r': 'v', 'i': 's', 'z': '*', 'y': 'p'}
To access tables, we will use the TAP service in a similar manner to what we showed in the DP0.2 introductory tutorial notebook, and explored further in the DP0.2 tutorial notebook 02 catalog queries with TAP. See those notebooks for more details.
service = get_tap_service("tap")
2.0. Discover truth dataΒΆ
The DP0.2 Documentation contains a list of all DP0.2 catalogs, and also a link to the DP0.2 Schema Browser where users can read about the available tables and their contents.
Alternatively, the Portal Aspect of the Rubin Science Platform can be used to browse catalog data.
Below, we show how to browse catalog data from a Notebook using the TAP service.
2.1. Print the names of all available tablesΒΆ
results = service.search("SELECT description, table_name FROM TAP_SCHEMA.tables")
results_tab = results.to_table()
for tablename in results_tab['table_name']:
print(tablename)
dp01_dc2_catalogs.forced_photometry dp01_dc2_catalogs.object dp01_dc2_catalogs.position dp01_dc2_catalogs.reference dp01_dc2_catalogs.truth_match dp02_dc2_catalogs.CcdVisit dp02_dc2_catalogs.CoaddPatches dp02_dc2_catalogs.DiaObject dp02_dc2_catalogs.DiaSource dp02_dc2_catalogs.ForcedSource dp02_dc2_catalogs.ForcedSourceOnDiaObject dp02_dc2_catalogs.MatchesTruth dp02_dc2_catalogs.Object dp02_dc2_catalogs.Source dp02_dc2_catalogs.TruthSummary dp02_dc2_catalogs.Visit ivoa.ObsCore tap_schema.columns tap_schema.key_columns tap_schema.keys tap_schema.schemas tap_schema.tables uws.Job
del results, results_tab
2.2. Print the table schema for MatchesTruthΒΆ
Use the .to_pandas()
method, and not just .to_table()
(astropy table), so that all rows of the second cell display.
results = service.search("SELECT column_name, datatype, description,\
unit from TAP_SCHEMA.columns\
WHERE table_name = 'dp02_dc2_catalogs.MatchesTruth'")
results.to_table().to_pandas()
column_name | datatype | description | unit | |
---|---|---|---|---|
0 | id | char | id for TruthSummary source. Potentially non-un... | |
1 | id_truth_type | char | Combination of TruthSummary id and truth_type ... | |
2 | match_candidate | boolean | True for sources that were selected for matching | |
3 | match_chisq | double | The chi-squared value of the (best) match | |
4 | match_count | int | Number of candidate object matches within matc... | |
5 | match_n_chisq_finite | int | The number of finite columns used to compute t... | |
6 | match_objectId | long | objectId of matched entry in the Object table,... | |
7 | truth_type | int | Type of TruthSummary source; 1 for galaxies, 2... |
The above is fine if the full description is not needed, but there are some important details that are being hidden by the line truncation above.
Try this instead.
for c, columnname in enumerate(results['column_name']):
print('%-25s %-200s' % (columnname, results['description'][c]))
id id for TruthSummary source. Potentially non-unique; use id_truth_type for JOINs. id_truth_type Combination of TruthSummary id and truth_type fields, used for JOINs. match_candidate True for sources that were selected for matching match_chisq The chi-squared value of the (best) match match_count Number of candidate object matches within match radius match_n_chisq_finite The number of finite columns used to compute the match chisq match_objectId objectId of matched entry in the Object table, if any truth_type Type of TruthSummary source; 1 for galaxies, 2 for stars, and 3 for SNe
del results
2.2. Print the table schema for TruthSummaryΒΆ
results = service.search("SELECT column_name, datatype, description,\
unit from TAP_SCHEMA.columns\
WHERE table_name = 'dp02_dc2_catalogs.TruthSummary'")
results.to_table().to_pandas()
column_name | datatype | description | unit | |
---|---|---|---|---|
0 | cosmodc2_hp | long | Healpix ID in cosmoDC2 (for galaxies only; -1 ... | |
1 | cosmodc2_id | long | Galaxy ID in cosmoDC2 (for galaxies only; -1 f... | |
2 | dec | double | Declination | deg |
3 | flux_g | float | Static flux value in g | nJy |
4 | flux_g_noMW | float | Static flux value in g, without Milky Way exti... | nJy |
5 | flux_i | float | Static flux value in i | nJy |
6 | flux_i_noMW | float | Static flux value in i, without Milky Way exti... | nJy |
7 | flux_r | float | Static flux value in r | nJy |
8 | flux_r_noMW | float | Static flux value in r, without Milky Way exti... | nJy |
9 | flux_u | float | Static flux value in u | nJy |
10 | flux_u_noMW | float | Static flux value in u, without Milky Way exti... | nJy |
11 | flux_y | float | Static flux value in y | nJy |
12 | flux_y_noMW | float | Static flux value in y, without Milky Way exti... | nJy |
13 | flux_z | float | Static flux value in z | nJy |
14 | flux_z_noMW | float | Static flux value in z, without Milky Way exti... | nJy |
15 | host_galaxy | long | ID of the host galaxy for a SN/AGN entry (-1 f... | |
16 | id | char | Unique object ID | |
17 | id_truth_type | char | Combination of id and truth_type fields, used ... | |
18 | is_pointsource | int | 1 for a point source | |
19 | is_variable | int | 1 for a variable source | |
20 | mag_r | float | Magnitude in r | mag |
21 | ra | double | Right Ascension | deg |
22 | redshift | float | Redshift | |
23 | truth_type | long | 1 for galaxies, 2 for stars, and 3 for SNe |
for c, columnname in enumerate(results['column_name']):
print('%-25s %-200s' % (columnname, results['description'][c]))
cosmodc2_hp Healpix ID in cosmoDC2 (for galaxies only; -1 for stars and SNe) cosmodc2_id Galaxy ID in cosmoDC2 (for galaxies only; -1 for stars and SNe) dec Declination flux_g Static flux value in g flux_g_noMW Static flux value in g, without Milky Way extinction (i.e., dereddened) flux_i Static flux value in i flux_i_noMW Static flux value in i, without Milky Way extinction (i.e., dereddened) flux_r Static flux value in r flux_r_noMW Static flux value in r, without Milky Way extinction (i.e., dereddened) flux_u Static flux value in u flux_u_noMW Static flux value in u, without Milky Way extinction (i.e., dereddened) flux_y Static flux value in y flux_y_noMW Static flux value in y, without Milky Way extinction (i.e., dereddened) flux_z Static flux value in z flux_z_noMW Static flux value in z, without Milky Way extinction (i.e., dereddened) host_galaxy ID of the host galaxy for a SN/AGN entry (-1 for other truth types) id Unique object ID id_truth_type Combination of id and truth_type fields, used for JOINs with MatchesTruth. is_pointsource 1 for a point source is_variable 1 for a variable source mag_r Magnitude in r ra Right Ascension redshift Redshift truth_type 1 for galaxies, 2 for stars, and 3 for SNe
del results
query = "SELECT mt.id_truth_type, mt.match_objectId, ts.ra, ts.dec, ts.truth_type "\
"FROM dp02_dc2_catalogs.MatchesTruth AS mt "\
"JOIN dp02_dc2_catalogs.TruthSummary AS ts ON mt.id_truth_type = ts.id_truth_type "\
"WHERE CONTAINS(POINT('ICRS', ts.ra, ts.dec), CIRCLE('ICRS', 62.0, -37.0, 0.10)) = 1 "
print(query)
SELECT mt.id_truth_type, mt.match_objectId, ts.ra, ts.dec, ts.truth_type FROM dp02_dc2_catalogs.MatchesTruth AS mt JOIN dp02_dc2_catalogs.TruthSummary AS ts ON mt.id_truth_type = ts.id_truth_type WHERE CONTAINS(POINT('ICRS', ts.ra, ts.dec), CIRCLE('ICRS', 62.0, -37.0, 0.10)) = 1
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
Job phase is COMPLETED
results = job.fetch_result().to_table()
Notice that not all objects from the truth table have matches in the dp02_dc2_catalogs.Object table of detections in the deep coadded images (i.e., their "match_objectId" is blank, meaning there was no match). Print the fraction of retrieved truth objects that are matched to the dp02_dc2_catalogs.Object table.
tx = np.where(results['match_objectId'] > 1)[0]
print('Number: ', len(tx))
print('Fraction: ', np.round(len(tx)/len(results),2))
Number: 14850 Fraction: 0.23
del results
3.2. Triple-join MatchesTruth, TruthSummary, and ObjectsΒΆ
The MatchesTruth
table provides identifying information to pick out objects that have matches to truth objects. In order to compare measured quantities (e.g., fluxes) from the Object
table to the simulated truth values, we also need data from the TruthSummary
table. This requires joining all three tables.
Notice: When restricting a JOIN query to just a portion of the sky (e.g., a cone search), restricting on coordinates in
Object
,DiaObject
, orSource
where possible can result in significantly improved performance.
Note that the table length is now equal to the number of matched objects printed above (14850).
query = "SELECT mt.id_truth_type, mt.match_objectId, ts.ra, ts.dec, ts.truth_type, "\
"obj.coord_ra, obj.coord_dec "\
"FROM dp02_dc2_catalogs.MatchesTruth AS mt "\
"JOIN dp02_dc2_catalogs.TruthSummary AS ts ON mt.id_truth_type = ts.id_truth_type "\
"JOIN dp02_dc2_catalogs.Object AS obj ON mt.match_objectId = obj.objectId "\
"WHERE CONTAINS(POINT('ICRS', obj.coord_ra, obj.coord_dec), CIRCLE('ICRS', 62.0, -37.0, 0.10)) = 1 "
print(query)
SELECT mt.id_truth_type, mt.match_objectId, ts.ra, ts.dec, ts.truth_type, obj.coord_ra, obj.coord_dec FROM dp02_dc2_catalogs.MatchesTruth AS mt JOIN dp02_dc2_catalogs.TruthSummary AS ts ON mt.id_truth_type = ts.id_truth_type JOIN dp02_dc2_catalogs.Object AS obj ON mt.match_objectId = obj.objectId WHERE CONTAINS(POINT('ICRS', obj.coord_ra, obj.coord_dec), CIRCLE('ICRS', 62.0, -37.0, 0.10)) = 1
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
Job phase is COMPLETED
results = job.fetch_result().to_table()
print(len(results))
14850
del results
3.3. Efficiently return truth matched data for a single ObjectΒΆ
It may be a common query to check whether truth matched data exists for a given Object that was detected in a deepCoadd.
Warning: Please note that the restriction for the given Object is written in the query below specifically as
WHERE obj.objectId=1486698050427598336
. If we were to writeWHERE mt.match_objectId=1486698050427598336
instead, the query could take orders of magnitude longer to execute.
This potential stumbling block stems from how the DP0 tables are stored in Qserv (the distributed database): the TruthSummary
and Object
tables are stored in Qserv as what are known as "director" tables, while the MatchesTruth
table used to join between them is stored as a somewhat more restricted "ref match" table. Qserv has special mechanics to optimize queries with WHERE
restrictions expressed in terms of director tables, and can often dispatch these queries to just a few involved data shards. These same mechanics, however, cannot be applied in general to ref match tables, so the seemingly same restriction, if expressed in terms of the ref match table, would necessitate a full scan of the entire catalog which could be quite time-consuming.
While we would like Qserv to be able to notice and rewrite such queries on its own so our users would not have to be made aware of such details, this is the situation as it exists today, for DP0. In general:
Advisory: If ever a simple query that seems it should take seconds takes an hour (or even ten minutes!) instead of seconds, please submit a GitHub Issue or post in the DP0 RSP Service Issues category of the Rubin Community Forum. Staff there with specific familiarity with Qserv are happy to engage to investigate and help tweak queries for optimal execution.
query = "SELECT mt.id_truth_type AS mt_id_truth_type, "\
"mt.match_objectId AS mt_match_objectId, "\
"obj.objectId AS obj_objectId, "\
"ts.redshift AS ts_redshift "\
"FROM dp02_dc2_catalogs.MatchesTruth AS mt "\
"JOIN dp02_dc2_catalogs.TruthSummary AS ts "\
"ON mt.id_truth_type=ts.id_truth_type "\
"JOIN dp02_dc2_catalogs.Object AS obj "\
"ON mt.match_objectId=obj.objectId "\
"WHERE obj.objectId=1486698050427598336 "\
"AND ts.truth_type=1 "\
"AND obj.detect_isPrimary=1 "\
"ORDER BY obj_objectId DESC"
print(query)
SELECT mt.id_truth_type AS mt_id_truth_type, mt.match_objectId AS mt_match_objectId, obj.objectId AS obj_objectId, ts.redshift AS ts_redshift FROM dp02_dc2_catalogs.MatchesTruth AS mt JOIN dp02_dc2_catalogs.TruthSummary AS ts ON mt.id_truth_type=ts.id_truth_type JOIN dp02_dc2_catalogs.Object AS obj ON mt.match_objectId=obj.objectId WHERE obj.objectId=1486698050427598336 AND ts.truth_type=1 AND obj.detect_isPrimary=1 ORDER BY obj_objectId DESC
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
Job phase is COMPLETED
results = job.fetch_result().to_table()
results
mt_id_truth_type | mt_match_objectId | obj_objectId | ts_redshift |
---|---|---|---|
str18 | int64 | int64 | float32 |
9752536104_1 | 1486698050427598336 | 1486698050427598336 | 1.05513 |
del results
3.4. Retrieve additional data for true galaxies that are matched to detected objectsΒΆ
With the query below we retrieve much more data for true galaxies, such as their true and measured fluxes and extendedness. Since we are only retrieving measurement data from the Object table for true galaxies, we retrieve the cModelFlux
instead of the psfFlux
, the latter being appropriate for point-sources (e.g., stars).
Notice: Above, the column names in the retrieved results have no provenance: once the data are in the results table, it is unclear from which table it originated (i.e., whether it is a true coordinate or a measured coordinate). Below, we use the
AS
statement to rename columns to start with their origin table, in order to keep track of what is from a truth table (mt_ and ts_) and what is from the object table (obj_).
Notice: Below, we use
truth_type = 1
to only retrieve truth and measurement data for "true galaxies."
The following query will retrieve 14501 results, 349 fewer than the query above, due to the specification of truth_type = 1
.
query = "SELECT mt.id_truth_type AS mt_id_truth_type, "\
"mt.match_objectId AS mt_match_objectId, "\
"ts.ra AS ts_ra, "\
"ts.dec AS ts_dec, "\
"ts.truth_type AS ts_truth_type, "\
"ts.mag_r AS ts_mag_r, "\
"ts.is_pointsource AS ts_is_pointsource, "\
"ts.redshift AS ts_redshift, "\
"ts.flux_u AS ts_flux_u, "\
"ts.flux_g AS ts_flux_g, "\
"ts.flux_r AS ts_flux_r, "\
"ts.flux_i AS ts_flux_i, "\
"ts.flux_z AS ts_flux_z, "\
"ts.flux_y AS ts_flux_y, "\
"obj.coord_ra AS obj_coord_ra, "\
"obj.coord_dec AS obj_coord_dec, "\
"obj.refExtendedness AS obj_refExtendedness, "\
"scisql_nanojanskyToAbMag(obj.r_cModelFlux) AS obj_cModelMag_r, "\
"obj.u_cModelFlux AS obj_u_cModelFlux, "\
"obj.g_cModelFlux AS obj_g_cModelFlux, "\
"obj.r_cModelFlux AS obj_r_cModelFlux, "\
"obj.i_cModelFlux AS obj_i_cModelFlux, "\
"obj.z_cModelFlux AS obj_z_cModelFlux, "\
"obj.y_cModelFlux AS obj_y_cModelFlux "\
"FROM dp02_dc2_catalogs.MatchesTruth AS mt "\
"JOIN dp02_dc2_catalogs.TruthSummary AS ts ON mt.id_truth_type = ts.id_truth_type "\
"JOIN dp02_dc2_catalogs.Object AS obj ON mt.match_objectId = obj.objectId "\
"WHERE CONTAINS(POINT('ICRS', obj.coord_ra, obj.coord_dec), CIRCLE('ICRS', 62.0, -37.0, 0.10)) = 1 "\
"AND ts.truth_type = 1 "\
"AND obj.detect_isPrimary = 1"
print(query)
SELECT mt.id_truth_type AS mt_id_truth_type, mt.match_objectId AS mt_match_objectId, ts.ra AS ts_ra, ts.dec AS ts_dec, ts.truth_type AS ts_truth_type, ts.mag_r AS ts_mag_r, ts.is_pointsource AS ts_is_pointsource, ts.redshift AS ts_redshift, ts.flux_u AS ts_flux_u, ts.flux_g AS ts_flux_g, ts.flux_r AS ts_flux_r, ts.flux_i AS ts_flux_i, ts.flux_z AS ts_flux_z, ts.flux_y AS ts_flux_y, obj.coord_ra AS obj_coord_ra, obj.coord_dec AS obj_coord_dec, obj.refExtendedness AS obj_refExtendedness, scisql_nanojanskyToAbMag(obj.r_cModelFlux) AS obj_cModelMag_r, obj.u_cModelFlux AS obj_u_cModelFlux, obj.g_cModelFlux AS obj_g_cModelFlux, obj.r_cModelFlux AS obj_r_cModelFlux, obj.i_cModelFlux AS obj_i_cModelFlux, obj.z_cModelFlux AS obj_z_cModelFlux, obj.y_cModelFlux AS obj_y_cModelFlux FROM dp02_dc2_catalogs.MatchesTruth AS mt JOIN dp02_dc2_catalogs.TruthSummary AS ts ON mt.id_truth_type = ts.id_truth_type JOIN dp02_dc2_catalogs.Object AS obj ON mt.match_objectId = obj.objectId WHERE CONTAINS(POINT('ICRS', obj.coord_ra, obj.coord_dec), CIRCLE('ICRS', 62.0, -37.0, 0.10)) = 1 AND ts.truth_type = 1 AND obj.detect_isPrimary = 1
This query might take a couple of minutes to execute.
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
Job phase is COMPLETED
results = job.fetch_result().to_table()
Notice that there is no del results
statement here.
Keep these results and use them below, in Section 4, to explore the retrieved data for true galaxies.
4.0. Compare true and measured properties for true galaxiesΒΆ
4.1. Plot coordinate offsets for true galaxiesΒΆ
Below, plot the difference between the true and measured declination versus the difference between the true and measured right ascension.
fig = plt.figure(figsize=(4, 4))
plt.plot(3600*(results['ts_ra']-results['obj_coord_ra']), \
3600*(results['ts_dec']-results['obj_coord_dec']), \
'o', ms=2, alpha=0.2, mew=0)
plt.xlabel('Right Ascension (true-measured; ["])', fontsize=12)
plt.ylabel('Declination (true-measured; ["])', fontsize=12)
plt.show()
Figure 1: A scatter plot of the difference between the true and measured coordinates of right ascension (RA) versus declination (Dec), in arcseconds.
We see that the scatter is less than about 0.5 arcseconds. For the (DC2-simulated) LSST Science Camera's platescale of 0.2 arcsec per pixel, that's a measurement accuracy of 2.5 pixels. Note also that most galaxies' positions are measured to sub-pixel accuracy.
4.2. How many true galaxies are measured as point sources?ΒΆ
The number of true galaxies that are measured as point sources, or in other words, have a measured refExtendedness
equal to zero (see the schema for the DP0.2 Object table for more info about the refExtendedness
column).
In this example, the following cell will report that 19% of true galaxies appear as point sources.
x = np.where(results['obj_refExtendedness'] == 0)[0]
print('Number: ', len(x))
print('Fraction: ', np.round(len(x)/len(results['ts_is_pointsource']),2))
del x
Number: 2819 Fraction: 0.19
The fact that 19% of the true galaxies retrieved from the catalog appear point-like does not necessarily indicate an error in the measurement pipelines. For example, very small or very distant galaxies can appear point-like, even if they were simulated as extended objects. It is left as an exercise for the learner to explore what types of galaxies are measured to be point-like.
4.3. Compare true and measured r-band magnitudes for true galaxiesΒΆ
fig = plt.figure(figsize=(4, 4))
plt.plot([18,32], [18,32], ls='solid', color='black', alpha=0.5)
x = np.where(results['obj_refExtendedness'] == 1)[0]
plt.plot(results['ts_mag_r'][x], results['obj_cModelMag_r'][x], \
'o', ms=4, alpha=0.2, mew=0, color=plot_filter_colors['r'],\
label='measured as extended')
del x
x = np.where(results['obj_refExtendedness'] == 0)[0]
plt.plot(results['ts_mag_r'][x], results['obj_cModelMag_r'][x], \
'o', ms=2, alpha=0.5, mew=0, color='black',\
label='measured as point-like')
del x
plt.xlabel('true r-band magnitude', fontsize=12)
plt.ylabel('measured cModel r-band magnitude', fontsize=12)
plt.legend(loc='lower right')
plt.xlim([18,30])
plt.ylim([18,30])
plt.show()
Figure 2: The true $r$-band magnitude versus the measured
cModel
$r$-band magnitude for extended (dark red) and point-like (black) objects.
4.4. Compare true and measured fluxes in all filters for true galaxiesΒΆ
fig, ax = plt.subplots(2, 3, figsize=(10, 7))
i=0
j=0
for f,filt in enumerate(plot_filter_labels):
ax[i,j].plot([0.1,1e6], [0.1,1e6], ls='solid', color='black', alpha=0.5)
ax[i,j].plot(results['ts_flux_'+filt], results['obj_'+filt+'_cModelFlux'], \
plot_filter_symbols[filt], color=plot_filter_colors[filt], \
alpha=0.1, mew=0, label=filt)
ax[i,j].loglog()
ax[i,j].text(0.1, 0.9, filt, horizontalalignment='center', verticalalignment='center',
transform = ax[i,j].transAxes, color=plot_filter_colors[filt], fontsize=14)
ax[i,j].set_xlim([0.1,1e6])
ax[i,j].set_ylim([0.1,1e6])
j += 1
if j == 3:
i += 1
j = 0
ax[0,0].set_ylabel('measured cModelFlux', fontsize=12)
ax[1,0].set_ylabel('measured cModelFlux', fontsize=12)
ax[1,0].set_xlabel('true flux', fontsize=12)
ax[1,1].set_xlabel('true flux', fontsize=12)
ax[1,2].set_xlabel('true flux', fontsize=12)
plt.tight_layout()
plt.show()
Figure 3: The true flux versus the measured
cModel
flux for extended objects (galaxies) by filter.
4.5. Compare color-magnitude diagrams (CMDs) for true and measured properties of true galaxiesΒΆ
The following cells plot the true CMD at left in black, and the measured CMD at right in grey.
The first pair of plots uses the r-band for magnitude, and the g-r color. The second pair of plots uses the r-band for magnitude, and i-z for color.
In the first set of plots, the effects of measurement uncertainties are correlated between the x and y axes because the r-band data are included in both axes. In the second set of plots, the i-band and the z-band are instead used for color. Notice how the effect of measurement uncertainties changes.
Recall that these plots do not contain data for stars, as only true galaxies were retrieved from the truth tables.
Notice: Pink "RuntimeWarning" errors will appear due to a few of the measured fluxes in the denominator being zero. It is OK to ignore these warnings for the context of this tutorial, which focuses on retrieving truth data, but for scientific analyses users should follow up and understand such warnings (e.g., use flags to reject poor flux measurements from their samples).
fig, ax = plt.subplots(1, 2, figsize=(8, 4))
ax[0].plot(-2.5*np.log10(results['ts_flux_g']/results['ts_flux_r']), results['ts_mag_r'], \
'o', ms=2, alpha=0.2, mew=0, color='black')
ax[1].plot(-2.5*np.log10(results['obj_g_cModelFlux']/results['obj_r_cModelFlux']), results['obj_cModelMag_r'], \
'o', ms=2, alpha=0.2, mew=0, color='grey')
ax[0].set_xlabel('true color (g-r)', fontsize=12)
ax[0].set_ylabel('true magnitude (r-band)', fontsize=12)
ax[0].set_xlim([-2, 4])
ax[0].set_ylim([30, 18])
ax[1].set_xlabel('measured color (g-r)', fontsize=12)
ax[1].set_ylabel('measured magnitude (r-band)', fontsize=12)
ax[1].set_xlim([-2, 4])
ax[1].set_ylim([30, 18])
plt.tight_layout()
plt.show()
/scratch/melissagraham/tmp/ipykernel_7116/2898140823.py:5: RuntimeWarning: divide by zero encountered in log10 ax[1].plot(-2.5*np.log10(results['obj_g_cModelFlux']/results['obj_r_cModelFlux']), results['obj_cModelMag_r'], \ /scratch/melissagraham/tmp/ipykernel_7116/2898140823.py:5: RuntimeWarning: invalid value encountered in log10 ax[1].plot(-2.5*np.log10(results['obj_g_cModelFlux']/results['obj_r_cModelFlux']), results['obj_cModelMag_r'], \
Figure 4: At left, the true $g-r$ color versus true $r$-band magnitude. At right, the measured $g-r$ color versus measured $r$-band magnitude. This plot demonstrates the impact of measurement errors on galaxy colors.
fig, ax = plt.subplots(1, 2, figsize=(8, 4))
ax[0].plot(-2.5*np.log10(results['ts_flux_i']/results['ts_flux_z']), results['ts_mag_r'], \
'o', ms=2, alpha=0.2, mew=0, color='black')
ax[1].plot(-2.5*np.log10(results['obj_i_cModelFlux']/results['obj_z_cModelFlux']), results['obj_cModelMag_r'], \
'o', ms=2, alpha=0.2, mew=0, color='grey')
ax[0].set_xlabel('true color (i-z)', fontsize=12)
ax[0].set_ylabel('true magnitude (r-band)', fontsize=12)
ax[0].set_xlim([-2, 4])
ax[0].set_ylim([30, 18])
ax[1].set_xlabel('measured color (i-z)', fontsize=12)
ax[1].set_ylabel('measured magnitude (r-band)', fontsize=12)
ax[1].set_xlim([-2, 4])
ax[1].set_ylim([30, 18])
plt.tight_layout()
plt.show()
/scratch/melissagraham/tmp/ipykernel_7116/3986373313.py:5: RuntimeWarning: divide by zero encountered in log10 ax[1].plot(-2.5*np.log10(results['obj_i_cModelFlux']/results['obj_z_cModelFlux']), results['obj_cModelMag_r'], \ /scratch/melissagraham/tmp/ipykernel_7116/3986373313.py:5: RuntimeWarning: invalid value encountered in log10 ax[1].plot(-2.5*np.log10(results['obj_i_cModelFlux']/results['obj_z_cModelFlux']), results['obj_cModelMag_r'], \
Figure 5: Similar to Figure 4, but with $i-z$ color instead of $g-r$.
5.0 Exercises for the learnerΒΆ
Repeat the query in Section 3.3, but instead of only retrieving true galaxies (
ts.truth_type = 1
), include stars, which have atruth_type
of 2. Since stars are point sources, instead of only retrievingcModelFlux
measurements, also retrievepsfFlux
from the Object catalog, because PSF-fit fluxes are more appropriate for point sources.As mentioned in Section 4.2, it is left as an exercise for the learner to explore what types of galaxies are measured to be point-like.
Explore the truth data for Type Ia supernovae (
truth_type = 3
).