Comparing Object and Truth TablesΒΆ

No description has been provided for this image
Contact author: Jeff Carlin
Last verified to run: 2024-08-19
LSST Science Pipelines version: Weekly 2024_32
Container size: medium
Targeted learning level: beginner

Description: An introduction to using the truth data for the Dark Energy Science Collaboration's DC2 data set, which formed the basis for the DP0 data products.

Skills: Use the TAP service with table joins to retrieve truth data matched to the Object catalog.

LSST Data Products: TAP dp02_dc2_catalogs.Object, .MatchesTruth, and .TruthSummary tables.

Packages: lsst.rsp.get_tap_service, lsst.rsp.retrieve_query

Credit: Originally developed by Jeff Carlin and the Rubin Community Science Team in the context of the Rubin DP0, with some help from Melissa Graham in the update from DP0.1 to DP0.2.

Get Support: Find DP0-related documentation and resources at dp0.lsst.io. Questions are welcome as new topics in the Support - Data Preview 0 Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.

1.0. IntroductionΒΆ

This tutorial demonstrates how to use the TAP service to query and retrieve data from the two truth tables and the Object table for DP0.2. Joining these three tables enables users to compare the recovered (measured) properties (e.g., fluxes, positions, magnitudes, etc. from the Object table, which is comprised of SNR>5 detections in the deepCoadds) to the simulated values that were assigned to each object when creating the DC2 simulations.

Thanks to the DESC! The DC2 simulations which make up DP0 were generated by the Dark Energy Science Collaboration for their Data Challenge 2 (DC2). A full description of the simulated truth data can be found in the ApJS paper The LSST DESC DC2 Simulated Sky Survey, with more information in the DESC's DC2 Data Release Note.

1.1. Package importsΒΆ

The matplotlib, numpy, pandas, and astropy libraries are widely used Python libraries for plotting, scientific computing, and astronomical data analysis. We will use these packages below, including the matplotlib.pyplot plotting sublibrary.

We also use the warnings package to minimize standard output within the notebook, and the lsst.rsp package to access the TAP service and query the DP0 catalogs.

InΒ [1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas
from lsst.rsp import get_tap_service, retrieve_query

1.2. Define functions and parametersΒΆ

Set the pandas parameter for the maximum number of rows to display to 200.

InΒ [2]:
pandas.set_option('display.max_rows', 200)

Set matplotlib to show plots inline, within the notebook.

InΒ [3]:
%matplotlib inline

Set up colors and plot symbols corresponding to the ugrizy bands. These colors are the same as those used for ugrizy bands in Dark Energy Survey (DES) publications, and are defined in this github repository.

InΒ [4]:
plot_filter_labels = ['u', 'g', 'r', 'i', 'z', 'y']
plot_filter_colors = {'u': '#0c71ff', 'g': '#49be61', 'r': '#c61c00',
                      'i': '#ffc200', 'z': '#f341a2', 'y': '#5d0000'}
plot_filter_symbols = {'u': 'o', 'g': '^', 'r': 'v', 'i': 's', 'z': '*', 'y': 'p'}

To access tables, we will use the TAP service in a similar manner to what we showed in the DP0.2 introductory tutorial notebook, and explored further in the DP0.2 tutorial notebook 02 catalog queries with TAP. See those notebooks for more details.

InΒ [5]:
service = get_tap_service("tap")

2.0. Discover truth dataΒΆ

The DP0.2 Documentation contains a list of all DP0.2 catalogs, and also a link to the DP0.2 Schema Browser where users can read about the available tables and their contents.

Alternatively, the Portal Aspect of the Rubin Science Platform can be used to browse catalog data.

Below, we show how to browse catalog data from a Notebook using the TAP service.

2.1. Print the names of all available tablesΒΆ

InΒ [6]:
results = service.search("SELECT description, table_name FROM TAP_SCHEMA.tables")
results_tab = results.to_table()

for tablename in results_tab['table_name']:
    print(tablename)
dp01_dc2_catalogs.forced_photometry
dp01_dc2_catalogs.object
dp01_dc2_catalogs.position
dp01_dc2_catalogs.reference
dp01_dc2_catalogs.truth_match
dp02_dc2_catalogs.CcdVisit
dp02_dc2_catalogs.CoaddPatches
dp02_dc2_catalogs.DiaObject
dp02_dc2_catalogs.DiaSource
dp02_dc2_catalogs.ForcedSource
dp02_dc2_catalogs.ForcedSourceOnDiaObject
dp02_dc2_catalogs.MatchesTruth
dp02_dc2_catalogs.Object
dp02_dc2_catalogs.Source
dp02_dc2_catalogs.TruthSummary
dp02_dc2_catalogs.Visit
ivoa.ObsCore
tap_schema.columns
tap_schema.key_columns
tap_schema.keys
tap_schema.schemas
tap_schema.tables
uws.Job
InΒ [7]:
del results, results_tab

2.2. Print the table schema for MatchesTruthΒΆ

Use the .to_pandas() method, and not just .to_table() (astropy table), so that all rows of the second cell display.

InΒ [8]:
results = service.search("SELECT column_name, datatype, description,\
                          unit from TAP_SCHEMA.columns\
                          WHERE table_name = 'dp02_dc2_catalogs.MatchesTruth'")
InΒ [9]:
results.to_table().to_pandas()
Out[9]:
column_name datatype description unit
0 id char id for TruthSummary source. Potentially non-un...
1 id_truth_type char Combination of TruthSummary id and truth_type ...
2 match_candidate boolean True for sources that were selected for matching
3 match_chisq double The chi-squared value of the (best) match
4 match_count int Number of candidate object matches within matc...
5 match_n_chisq_finite int The number of finite columns used to compute t...
6 match_objectId long objectId of matched entry in the Object table,...
7 truth_type int Type of TruthSummary source; 1 for galaxies, 2...

The above is fine if the full description is not needed, but there are some important details that are being hidden by the line truncation above.

Try this instead.

InΒ [10]:
for c, columnname in enumerate(results['column_name']):
    print('%-25s %-200s' % (columnname, results['description'][c]))
id                        id for TruthSummary source. Potentially non-unique; use id_truth_type for JOINs.                                                                                                                        
id_truth_type             Combination of TruthSummary id and truth_type fields, used for JOINs.                                                                                                                                   
match_candidate           True for sources that were selected for matching                                                                                                                                                        
match_chisq               The chi-squared value of the (best) match                                                                                                                                                               
match_count               Number of candidate object matches within match radius                                                                                                                                                  
match_n_chisq_finite      The number of finite columns used to compute the match chisq                                                                                                                                            
match_objectId            objectId of matched entry in the Object table, if any                                                                                                                                                   
truth_type                Type of TruthSummary source; 1 for galaxies, 2 for stars, and 3 for SNe                                                                                                                                 
InΒ [11]:
del results

2.2. Print the table schema for TruthSummaryΒΆ

InΒ [12]:
results = service.search("SELECT column_name, datatype, description,\
                          unit from TAP_SCHEMA.columns\
                          WHERE table_name = 'dp02_dc2_catalogs.TruthSummary'")
InΒ [13]:
results.to_table().to_pandas()
Out[13]:
column_name datatype description unit
0 cosmodc2_hp long Healpix ID in cosmoDC2 (for galaxies only; -1 ...
1 cosmodc2_id long Galaxy ID in cosmoDC2 (for galaxies only; -1 f...
2 dec double Declination deg
3 flux_g float Static flux value in g nJy
4 flux_g_noMW float Static flux value in g, without Milky Way exti... nJy
5 flux_i float Static flux value in i nJy
6 flux_i_noMW float Static flux value in i, without Milky Way exti... nJy
7 flux_r float Static flux value in r nJy
8 flux_r_noMW float Static flux value in r, without Milky Way exti... nJy
9 flux_u float Static flux value in u nJy
10 flux_u_noMW float Static flux value in u, without Milky Way exti... nJy
11 flux_y float Static flux value in y nJy
12 flux_y_noMW float Static flux value in y, without Milky Way exti... nJy
13 flux_z float Static flux value in z nJy
14 flux_z_noMW float Static flux value in z, without Milky Way exti... nJy
15 host_galaxy long ID of the host galaxy for a SN/AGN entry (-1 f...
16 id char Unique object ID
17 id_truth_type char Combination of id and truth_type fields, used ...
18 is_pointsource int 1 for a point source
19 is_variable int 1 for a variable source
20 mag_r float Magnitude in r mag
21 ra double Right Ascension deg
22 redshift float Redshift
23 truth_type long 1 for galaxies, 2 for stars, and 3 for SNe
InΒ [14]:
for c, columnname in enumerate(results['column_name']):
    print('%-25s %-200s' % (columnname, results['description'][c]))
cosmodc2_hp               Healpix ID in cosmoDC2 (for galaxies only; -1 for stars and SNe)                                                                                                                                        
cosmodc2_id               Galaxy ID in cosmoDC2 (for galaxies only; -1 for stars and SNe)                                                                                                                                         
dec                       Declination                                                                                                                                                                                             
flux_g                    Static flux value in g                                                                                                                                                                                  
flux_g_noMW               Static flux value in g, without Milky Way extinction (i.e., dereddened)                                                                                                                                 
flux_i                    Static flux value in i                                                                                                                                                                                  
flux_i_noMW               Static flux value in i, without Milky Way extinction (i.e., dereddened)                                                                                                                                 
flux_r                    Static flux value in r                                                                                                                                                                                  
flux_r_noMW               Static flux value in r, without Milky Way extinction (i.e., dereddened)                                                                                                                                 
flux_u                    Static flux value in u                                                                                                                                                                                  
flux_u_noMW               Static flux value in u, without Milky Way extinction (i.e., dereddened)                                                                                                                                 
flux_y                    Static flux value in y                                                                                                                                                                                  
flux_y_noMW               Static flux value in y, without Milky Way extinction (i.e., dereddened)                                                                                                                                 
flux_z                    Static flux value in z                                                                                                                                                                                  
flux_z_noMW               Static flux value in z, without Milky Way extinction (i.e., dereddened)                                                                                                                                 
host_galaxy               ID of the host galaxy for a SN/AGN entry (-1 for other truth types)                                                                                                                                     
id                        Unique object ID                                                                                                                                                                                        
id_truth_type             Combination of id and truth_type fields, used for JOINs with MatchesTruth.                                                                                                                              
is_pointsource            1 for a point source                                                                                                                                                                                    
is_variable               1 for a variable source                                                                                                                                                                                 
mag_r                     Magnitude in r                                                                                                                                                                                          
ra                        Right Ascension                                                                                                                                                                                         
redshift                  Redshift                                                                                                                                                                                                
truth_type                1 for galaxies, 2 for stars, and 3 for SNe                                                                                                                                                              
InΒ [15]:
del results

3.0. Retrieve truth dataΒΆ

3.1. Join MatchesTruth and TruthSummaryΒΆ

As described in the column description, the column id_truth_type should be used to join the MatchesTruth and TruthSummary tables.

InΒ [16]:
query = "SELECT mt.id_truth_type, mt.match_objectId, ts.ra, ts.dec, ts.truth_type "\
        "FROM dp02_dc2_catalogs.MatchesTruth AS mt "\
        "JOIN dp02_dc2_catalogs.TruthSummary AS ts ON mt.id_truth_type = ts.id_truth_type "\
        "WHERE CONTAINS(POINT('ICRS', ts.ra, ts.dec), CIRCLE('ICRS', 62.0, -37.0, 0.10)) = 1 "
print(query)
SELECT mt.id_truth_type, mt.match_objectId, ts.ra, ts.dec, ts.truth_type FROM dp02_dc2_catalogs.MatchesTruth AS mt JOIN dp02_dc2_catalogs.TruthSummary AS ts ON mt.id_truth_type = ts.id_truth_type WHERE CONTAINS(POINT('ICRS', ts.ra, ts.dec), CIRCLE('ICRS', 62.0, -37.0, 0.10)) = 1 
InΒ [17]:
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
Job phase is COMPLETED
InΒ [18]:
results = job.fetch_result().to_table()

Notice that not all objects from the truth table have matches in the dp02_dc2_catalogs.Object table of detections in the deep coadded images (i.e., their "match_objectId" is blank, meaning there was no match). Print the fraction of retrieved truth objects that are matched to the dp02_dc2_catalogs.Object table.

InΒ [19]:
tx = np.where(results['match_objectId'] > 1)[0]
print('Number: ', len(tx))
print('Fraction: ', np.round(len(tx)/len(results),2))
Number:  14850
Fraction:  0.23
InΒ [20]:
del results

3.2. Triple-join MatchesTruth, TruthSummary, and ObjectsΒΆ

The MatchesTruth table provides identifying information to pick out objects that have matches to truth objects. In order to compare measured quantities (e.g., fluxes) from the Object table to the simulated truth values, we also need data from the TruthSummary table. This requires joining all three tables.

Notice: When restricting a JOIN query to just a portion of the sky (e.g., a cone search), restricting on coordinates in Object, DiaObject, or Source where possible can result in significantly improved performance.

Note that the table length is now equal to the number of matched objects printed above (14850).

InΒ [21]:
query = "SELECT mt.id_truth_type, mt.match_objectId, ts.ra, ts.dec, ts.truth_type, "\
        "obj.coord_ra, obj.coord_dec "\
        "FROM dp02_dc2_catalogs.MatchesTruth AS mt "\
        "JOIN dp02_dc2_catalogs.TruthSummary AS ts ON mt.id_truth_type = ts.id_truth_type "\
        "JOIN dp02_dc2_catalogs.Object AS obj ON mt.match_objectId = obj.objectId "\
        "WHERE CONTAINS(POINT('ICRS', obj.coord_ra, obj.coord_dec), CIRCLE('ICRS', 62.0, -37.0, 0.10)) = 1 "
print(query)
SELECT mt.id_truth_type, mt.match_objectId, ts.ra, ts.dec, ts.truth_type, obj.coord_ra, obj.coord_dec FROM dp02_dc2_catalogs.MatchesTruth AS mt JOIN dp02_dc2_catalogs.TruthSummary AS ts ON mt.id_truth_type = ts.id_truth_type JOIN dp02_dc2_catalogs.Object AS obj ON mt.match_objectId = obj.objectId WHERE CONTAINS(POINT('ICRS', obj.coord_ra, obj.coord_dec), CIRCLE('ICRS', 62.0, -37.0, 0.10)) = 1 
InΒ [22]:
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
Job phase is COMPLETED
InΒ [23]:
results = job.fetch_result().to_table()
InΒ [24]:
print(len(results))
14850
InΒ [25]:
del results

3.3. Efficiently return truth matched data for a single ObjectΒΆ

It may be a common query to check whether truth matched data exists for a given Object that was detected in a deepCoadd.

Warning: Please note that the restriction for the given Object is written in the query below specifically as WHERE obj.objectId=1486698050427598336. If we were to write WHERE mt.match_objectId=1486698050427598336 instead, the query could take orders of magnitude longer to execute.

This potential stumbling block stems from how the DP0 tables are stored in Qserv (the distributed database): the TruthSummary and Object tables are stored in Qserv as what are known as "director" tables, while the MatchesTruth table used to join between them is stored as a somewhat more restricted "ref match" table. Qserv has special mechanics to optimize queries with WHERE restrictions expressed in terms of director tables, and can often dispatch these queries to just a few involved data shards. These same mechanics, however, cannot be applied in general to ref match tables, so the seemingly same restriction, if expressed in terms of the ref match table, would necessitate a full scan of the entire catalog which could be quite time-consuming.

While we would like Qserv to be able to notice and rewrite such queries on its own so our users would not have to be made aware of such details, this is the situation as it exists today, for DP0. In general:

Advisory: If ever a simple query that seems it should take seconds takes an hour (or even ten minutes!) instead of seconds, please submit a GitHub Issue or post in the DP0 RSP Service Issues category of the Rubin Community Forum. Staff there with specific familiarity with Qserv are happy to engage to investigate and help tweak queries for optimal execution.

InΒ [26]:
query = "SELECT mt.id_truth_type AS mt_id_truth_type, "\
        "mt.match_objectId AS mt_match_objectId, "\
        "obj.objectId AS obj_objectId, "\
        "ts.redshift AS ts_redshift "\
        "FROM dp02_dc2_catalogs.MatchesTruth AS mt "\
        "JOIN dp02_dc2_catalogs.TruthSummary AS ts "\
        "ON mt.id_truth_type=ts.id_truth_type "\
        "JOIN dp02_dc2_catalogs.Object AS obj "\
        "ON mt.match_objectId=obj.objectId "\
        "WHERE obj.objectId=1486698050427598336 "\
        "AND ts.truth_type=1 "\
        "AND obj.detect_isPrimary=1 "\
        "ORDER BY obj_objectId DESC"
print(query)
SELECT mt.id_truth_type AS mt_id_truth_type, mt.match_objectId AS mt_match_objectId, obj.objectId AS obj_objectId, ts.redshift AS ts_redshift FROM dp02_dc2_catalogs.MatchesTruth AS mt JOIN dp02_dc2_catalogs.TruthSummary AS ts ON mt.id_truth_type=ts.id_truth_type JOIN dp02_dc2_catalogs.Object AS obj ON mt.match_objectId=obj.objectId WHERE obj.objectId=1486698050427598336 AND ts.truth_type=1 AND obj.detect_isPrimary=1 ORDER BY obj_objectId DESC
InΒ [27]:
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
Job phase is COMPLETED
InΒ [28]:
results = job.fetch_result().to_table()
InΒ [29]:
results
Out[29]:
Table length=1
mt_id_truth_typemt_match_objectIdobj_objectIdts_redshift
str18int64int64float32
9752536104_1148669805042759833614866980504275983361.05513
InΒ [30]:
del results

3.4. Retrieve additional data for true galaxies that are matched to detected objectsΒΆ

With the query below we retrieve much more data for true galaxies, such as their true and measured fluxes and extendedness. Since we are only retrieving measurement data from the Object table for true galaxies, we retrieve the cModelFlux instead of the psfFlux, the latter being appropriate for point-sources (e.g., stars).

Notice: Above, the column names in the retrieved results have no provenance: once the data are in the results table, it is unclear from which table it originated (i.e., whether it is a true coordinate or a measured coordinate). Below, we use the AS statement to rename columns to start with their origin table, in order to keep track of what is from a truth table (mt_ and ts_) and what is from the object table (obj_).

Notice: Below, we use truth_type = 1 to only retrieve truth and measurement data for "true galaxies."

The following query will retrieve 14501 results, 349 fewer than the query above, due to the specification of truth_type = 1.

InΒ [31]:
query = "SELECT mt.id_truth_type AS mt_id_truth_type, "\
        "mt.match_objectId AS mt_match_objectId, "\
        "ts.ra AS ts_ra, "\
        "ts.dec AS ts_dec, "\
        "ts.truth_type AS ts_truth_type, "\
        "ts.mag_r AS ts_mag_r, "\
        "ts.is_pointsource AS ts_is_pointsource, "\
        "ts.redshift AS ts_redshift, "\
        "ts.flux_u AS ts_flux_u, "\
        "ts.flux_g AS ts_flux_g, "\
        "ts.flux_r AS ts_flux_r, "\
        "ts.flux_i AS ts_flux_i, "\
        "ts.flux_z AS ts_flux_z, "\
        "ts.flux_y AS ts_flux_y, "\
        "obj.coord_ra AS obj_coord_ra, "\
        "obj.coord_dec AS obj_coord_dec, "\
        "obj.refExtendedness AS obj_refExtendedness, "\
        "scisql_nanojanskyToAbMag(obj.r_cModelFlux) AS obj_cModelMag_r, "\
        "obj.u_cModelFlux AS obj_u_cModelFlux, "\
        "obj.g_cModelFlux AS obj_g_cModelFlux, "\
        "obj.r_cModelFlux AS obj_r_cModelFlux, "\
        "obj.i_cModelFlux AS obj_i_cModelFlux, "\
        "obj.z_cModelFlux AS obj_z_cModelFlux, "\
        "obj.y_cModelFlux AS obj_y_cModelFlux "\
        "FROM dp02_dc2_catalogs.MatchesTruth AS mt "\
        "JOIN dp02_dc2_catalogs.TruthSummary AS ts ON mt.id_truth_type = ts.id_truth_type "\
        "JOIN dp02_dc2_catalogs.Object AS obj ON mt.match_objectId = obj.objectId "\
        "WHERE CONTAINS(POINT('ICRS', obj.coord_ra, obj.coord_dec), CIRCLE('ICRS', 62.0, -37.0, 0.10)) = 1 "\
        "AND ts.truth_type = 1 "\
        "AND obj.detect_isPrimary = 1"
print(query)
SELECT mt.id_truth_type AS mt_id_truth_type, mt.match_objectId AS mt_match_objectId, ts.ra AS ts_ra, ts.dec AS ts_dec, ts.truth_type AS ts_truth_type, ts.mag_r AS ts_mag_r, ts.is_pointsource AS ts_is_pointsource, ts.redshift AS ts_redshift, ts.flux_u AS ts_flux_u, ts.flux_g AS ts_flux_g, ts.flux_r AS ts_flux_r, ts.flux_i AS ts_flux_i, ts.flux_z AS ts_flux_z, ts.flux_y AS ts_flux_y, obj.coord_ra AS obj_coord_ra, obj.coord_dec AS obj_coord_dec, obj.refExtendedness AS obj_refExtendedness, scisql_nanojanskyToAbMag(obj.r_cModelFlux) AS obj_cModelMag_r, obj.u_cModelFlux AS obj_u_cModelFlux, obj.g_cModelFlux AS obj_g_cModelFlux, obj.r_cModelFlux AS obj_r_cModelFlux, obj.i_cModelFlux AS obj_i_cModelFlux, obj.z_cModelFlux AS obj_z_cModelFlux, obj.y_cModelFlux AS obj_y_cModelFlux FROM dp02_dc2_catalogs.MatchesTruth AS mt JOIN dp02_dc2_catalogs.TruthSummary AS ts ON mt.id_truth_type = ts.id_truth_type JOIN dp02_dc2_catalogs.Object AS obj ON mt.match_objectId = obj.objectId WHERE CONTAINS(POINT('ICRS', obj.coord_ra, obj.coord_dec), CIRCLE('ICRS', 62.0, -37.0, 0.10)) = 1 AND ts.truth_type = 1 AND obj.detect_isPrimary = 1

This query might take a couple of minutes to execute.

InΒ [32]:
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
Job phase is COMPLETED
InΒ [33]:
results = job.fetch_result().to_table()

Notice that there is no del results statement here.

Keep these results and use them below, in Section 4, to explore the retrieved data for true galaxies.

4.0. Compare true and measured properties for true galaxiesΒΆ

4.1. Plot coordinate offsets for true galaxiesΒΆ

Below, plot the difference between the true and measured declination versus the difference between the true and measured right ascension.

InΒ [34]:
fig = plt.figure(figsize=(4, 4))
plt.plot(3600*(results['ts_ra']-results['obj_coord_ra']), \
         3600*(results['ts_dec']-results['obj_coord_dec']), \
         'o', ms=2, alpha=0.2, mew=0)
plt.xlabel('Right Ascension (true-measured; ["])', fontsize=12)
plt.ylabel('Declination (true-measured; ["])', fontsize=12)
plt.show()
No description has been provided for this image

Figure 1: A scatter plot of the difference between the true and measured coordinates of right ascension (RA) versus declination (Dec), in arcseconds.

We see that the scatter is less than about 0.5 arcseconds. For the (DC2-simulated) LSST Science Camera's platescale of 0.2 arcsec per pixel, that's a measurement accuracy of 2.5 pixels. Note also that most galaxies' positions are measured to sub-pixel accuracy.

4.2. How many true galaxies are measured as point sources?ΒΆ

The number of true galaxies that are measured as point sources, or in other words, have a measured refExtendedness equal to zero (see the schema for the DP0.2 Object table for more info about the refExtendedness column).

In this example, the following cell will report that 19% of true galaxies appear as point sources.

InΒ [35]:
x = np.where(results['obj_refExtendedness'] == 0)[0]
print('Number: ', len(x))
print('Fraction: ', np.round(len(x)/len(results['ts_is_pointsource']),2))
del x
Number:  2819
Fraction:  0.19

The fact that 19% of the true galaxies retrieved from the catalog appear point-like does not necessarily indicate an error in the measurement pipelines. For example, very small or very distant galaxies can appear point-like, even if they were simulated as extended objects. It is left as an exercise for the learner to explore what types of galaxies are measured to be point-like.

4.3. Compare true and measured r-band magnitudes for true galaxiesΒΆ

InΒ [36]:
fig = plt.figure(figsize=(4, 4))
plt.plot([18,32], [18,32], ls='solid', color='black', alpha=0.5)
x = np.where(results['obj_refExtendedness'] == 1)[0]
plt.plot(results['ts_mag_r'][x], results['obj_cModelMag_r'][x], \
         'o', ms=4, alpha=0.2, mew=0, color=plot_filter_colors['r'],\
         label='measured as extended')
del x
x = np.where(results['obj_refExtendedness'] == 0)[0]
plt.plot(results['ts_mag_r'][x], results['obj_cModelMag_r'][x], \
         'o', ms=2, alpha=0.5, mew=0, color='black',\
         label='measured as point-like')
del x
plt.xlabel('true r-band magnitude', fontsize=12)
plt.ylabel('measured cModel r-band magnitude', fontsize=12)
plt.legend(loc='lower right')
plt.xlim([18,30])
plt.ylim([18,30])
plt.show()
No description has been provided for this image

Figure 2: The true $r$-band magnitude versus the measured cModel $r$-band magnitude for extended (dark red) and point-like (black) objects.

4.4. Compare true and measured fluxes in all filters for true galaxiesΒΆ

InΒ [37]:
fig, ax = plt.subplots(2, 3, figsize=(10, 7))
i=0
j=0
for f,filt in enumerate(plot_filter_labels):
    ax[i,j].plot([0.1,1e6], [0.1,1e6], ls='solid', color='black', alpha=0.5)
    ax[i,j].plot(results['ts_flux_'+filt], results['obj_'+filt+'_cModelFlux'], \
                 plot_filter_symbols[filt], color=plot_filter_colors[filt], \
                 alpha=0.1, mew=0, label=filt)
    ax[i,j].loglog()
    ax[i,j].text(0.1, 0.9, filt, horizontalalignment='center', verticalalignment='center',
                 transform = ax[i,j].transAxes, color=plot_filter_colors[filt], fontsize=14)
    ax[i,j].set_xlim([0.1,1e6])
    ax[i,j].set_ylim([0.1,1e6])
    j += 1
    if j == 3:
        i += 1
        j = 0
ax[0,0].set_ylabel('measured cModelFlux', fontsize=12)
ax[1,0].set_ylabel('measured cModelFlux', fontsize=12)
ax[1,0].set_xlabel('true flux', fontsize=12)
ax[1,1].set_xlabel('true flux', fontsize=12)
ax[1,2].set_xlabel('true flux', fontsize=12)
plt.tight_layout()
plt.show()
No description has been provided for this image

Figure 3: The true flux versus the measured cModel flux for extended objects (galaxies) by filter.

4.5. Compare color-magnitude diagrams (CMDs) for true and measured properties of true galaxiesΒΆ

The following cells plot the true CMD at left in black, and the measured CMD at right in grey.

The first pair of plots uses the r-band for magnitude, and the g-r color. The second pair of plots uses the r-band for magnitude, and i-z for color.

In the first set of plots, the effects of measurement uncertainties are correlated between the x and y axes because the r-band data are included in both axes. In the second set of plots, the i-band and the z-band are instead used for color. Notice how the effect of measurement uncertainties changes.

Recall that these plots do not contain data for stars, as only true galaxies were retrieved from the truth tables.

Notice: Pink "RuntimeWarning" errors will appear due to a few of the measured fluxes in the denominator being zero. It is OK to ignore these warnings for the context of this tutorial, which focuses on retrieving truth data, but for scientific analyses users should follow up and understand such warnings (e.g., use flags to reject poor flux measurements from their samples).

InΒ [38]:
fig, ax = plt.subplots(1, 2, figsize=(8, 4))
ax[0].plot(-2.5*np.log10(results['ts_flux_g']/results['ts_flux_r']), results['ts_mag_r'], \
           'o', ms=2, alpha=0.2, mew=0, color='black')

ax[1].plot(-2.5*np.log10(results['obj_g_cModelFlux']/results['obj_r_cModelFlux']), results['obj_cModelMag_r'], \
           'o', ms=2, alpha=0.2, mew=0, color='grey')
ax[0].set_xlabel('true color (g-r)', fontsize=12)
ax[0].set_ylabel('true magnitude (r-band)', fontsize=12)
ax[0].set_xlim([-2, 4])
ax[0].set_ylim([30, 18])
ax[1].set_xlabel('measured color (g-r)', fontsize=12)
ax[1].set_ylabel('measured magnitude (r-band)', fontsize=12)
ax[1].set_xlim([-2, 4])
ax[1].set_ylim([30, 18])
plt.tight_layout()
plt.show()
/scratch/melissagraham/tmp/ipykernel_7116/2898140823.py:5: RuntimeWarning: divide by zero encountered in log10
  ax[1].plot(-2.5*np.log10(results['obj_g_cModelFlux']/results['obj_r_cModelFlux']), results['obj_cModelMag_r'], \
/scratch/melissagraham/tmp/ipykernel_7116/2898140823.py:5: RuntimeWarning: invalid value encountered in log10
  ax[1].plot(-2.5*np.log10(results['obj_g_cModelFlux']/results['obj_r_cModelFlux']), results['obj_cModelMag_r'], \
No description has been provided for this image

Figure 4: At left, the true $g-r$ color versus true $r$-band magnitude. At right, the measured $g-r$ color versus measured $r$-band magnitude. This plot demonstrates the impact of measurement errors on galaxy colors.

InΒ [39]:
fig, ax = plt.subplots(1, 2, figsize=(8, 4))
ax[0].plot(-2.5*np.log10(results['ts_flux_i']/results['ts_flux_z']), results['ts_mag_r'], \
           'o', ms=2, alpha=0.2, mew=0, color='black')

ax[1].plot(-2.5*np.log10(results['obj_i_cModelFlux']/results['obj_z_cModelFlux']), results['obj_cModelMag_r'], \
           'o', ms=2, alpha=0.2, mew=0, color='grey')
ax[0].set_xlabel('true color (i-z)', fontsize=12)
ax[0].set_ylabel('true magnitude (r-band)', fontsize=12)
ax[0].set_xlim([-2, 4])
ax[0].set_ylim([30, 18])
ax[1].set_xlabel('measured color (i-z)', fontsize=12)
ax[1].set_ylabel('measured magnitude (r-band)', fontsize=12)
ax[1].set_xlim([-2, 4])
ax[1].set_ylim([30, 18])
plt.tight_layout()
plt.show()
/scratch/melissagraham/tmp/ipykernel_7116/3986373313.py:5: RuntimeWarning: divide by zero encountered in log10
  ax[1].plot(-2.5*np.log10(results['obj_i_cModelFlux']/results['obj_z_cModelFlux']), results['obj_cModelMag_r'], \
/scratch/melissagraham/tmp/ipykernel_7116/3986373313.py:5: RuntimeWarning: invalid value encountered in log10
  ax[1].plot(-2.5*np.log10(results['obj_i_cModelFlux']/results['obj_z_cModelFlux']), results['obj_cModelMag_r'], \
No description has been provided for this image

Figure 5: Similar to Figure 4, but with $i-z$ color instead of $g-r$.

5.0 Exercises for the learnerΒΆ

  1. Repeat the query in Section 3.3, but instead of only retrieving true galaxies (ts.truth_type = 1), include stars, which have a truth_type of 2. Since stars are point sources, instead of only retrieving cModelFlux measurements, also retrieve psfFlux from the Object catalog, because PSF-fit fluxes are more appropriate for point sources.

  2. As mentioned in Section 4.2, it is left as an exercise for the learner to explore what types of galaxies are measured to be point-like.

  3. Explore the truth data for Type Ia supernovae (truth_type = 3).