ADQL Recipes

ADQL is the Astronomical Data Query Language. The language is used by the IVOA to represent astronomy queries posted to Virtual Observatory (VO) services, such as the Rubin LSST Table Access Protocol (TAP) service. ADQL is based on the Structured Query Language (SQL).

ADQL can be used in both the Notebook and Portal aspects:

Learn more about the TAP-accessible DP0.2 catalogs which are used in the examples below.

Important

If a query takes longer than you expect, please submit a GitHub Issue or post in the “DP0 RSP Service Issues” category of the Rubin Community Forum. Rubin staff are happy to investigate and to help tweak queries for optimal execution.

General Advice

LSST Query Services (Qserv) provides access to the LSST Database Catalogs. Users can query the catalogs using standard SQL query language with a few restrictions.

Use spatial constraints on RA and Dec. It is recommended to always start with spatial constraints for a small radius and then expand the search area. Qserv stores catalog data sharded by coordinate (RA, Dec). ADQL query statements that include constraints by coordinate do not requre a whole-catalog search, and are typically faster (and can be much faster) than ADQL query statements which only include constraints for other columns. It is recommended to use either an ADQL Cone Search or a Polygon Search, and to not use a WHERE ... BETWEEN statement to set boundaries on RA and Dec.

Use dectect_isPrimary = True. It is recommended to include detect_isPrimary = True in queries for the Object, Source, and ForcedSource catalogs. This parameter is True if a source has no children, is in the inner region of a coadd patch, is in the inner region of a coadd tract, and is not detected in a pseudo-filter. Including this constraint will remove any duplicates: it will not include the parent and its deblended children (only deblended children), and it will not include detections in the overlapping patch edge regions (only the non-overlapping inner regions).

Additional external resources for learning about SQL, ADQL, and Qserv include:

Exploring tables

When learning about the contents of a table, it can be handy to simply retrieve all columns for “a bunch” (hundreds to thousands) of rows and take a look at the results. For this use-case, it is recommended to use the SELECT TOP statement, like in the example below that just retrieves the first 100 rows of the Object table.

SELECT TOP 100 * FROM dp02_dc2_catalogs.Object

Convert fluxes to magnitudes

As above, retrieve the coord_dec and coord_ra columns from the Object table for objects within a 0.05 degree radius of RA = 62, Dec = -37, and also retrieve the g-band AB magnitude and magnitude error. The scisql functions used below can be applied to any flux column.

SELECT coord_dec, coord_ra,
scisql_nanojanskyToAbMag(g_calibFlux) AS g_calibMag,
scisql_nanojanskyToAbMagSigma(g_calibFlux, g_calibFluxErr) as g_calibMagErr
FROM dp02_dc2_catalogs.Object
WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec),
CIRCLE('ICRS', 62, -37, 0.05)) = 1

Table joins

Below, the Source and CcdVisit table are joined in order to obtain the date and seeing from the CcdVisit table. Any two tables can be joined so long as they have an index in common.

This query also renames (nicknames) columns and tables using AS, and applies a spatial constraint, a temporal constraint (using obsStartMJD), and constraints on the band, extendedness, and flux value.

Additional external resources on SQL table joins:
SELECT src.ccdVisitId AS src_ccdVisitId,
src.extendedness AS src_extendedness,
src.band AS src_band,
scisql_nanojanskyToAbMag(src.psfFlux) AS src_psfAbMag,
cv.obsStartMJD AS cv_obsStartMJD,
cv.seeing AS cv_seeing
FROM dp02_dc2_catalogs.Source AS src
JOIN dp02_dc2_catalogs.CcdVisit AS cv
ON src.ccdVisitId = cv.ccdVisitId
WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec),
CIRCLE('ICRS', 62.0, -37, 1)) = 1
AND src.band = 'i'
AND src.extendedness = 0
AND src.psfFlux > 10000
AND cv.obsStartMJD > 60925
AND cv.obsStartMJD < 60955

TruthSummary and MatchesTruth table joins

The query below demonstrates how to retrieve the truth table identifier (id_truth_type from the MatchesTruth table) and true redshift (from the TruthSummary table) for a particular detected object with ObjectId = 1486698050427598336 (from the Object table) using a triple table join.

Director vs. ref match tables: Note that the restriction for the given Object is written in the query below specifically as WHERE obj.objectId=1486698050427598336. If we were to write WHERE mt.match_objectId=1486698050427598336 instead, the query could take orders of magnitude longer to execute. This subtle difference exists because the TruthSummary and Object tables are stored in Qserv as what are known as director tables, while the MatchesTruth table used to join them is stored as a somewhat more restricted “ref match” table. Qserv has special mechanics to optimize queries with WHERE restrictions expressed in terms of director tables, and can often dispatch these queries to just a few involved data shards. These same mechanics, however, cannot be applied in general to “ref match” tables so the seemingly same restriction, if expressed in terms of the “ref match” table, would necessitate a full scan of the entire catalog which could be quite time-consuming.

SELECT mt.id_truth_type AS mt_id_truth_type,
mt.match_objectId AS mt_match_objectId,
obj.objectId AS obj_objectId,
ts.redshift AS ts_redshift
FROM dp02_dc2_catalogs.MatchesTruth AS mt
JOIN dp02_dc2_catalogs.TruthSummary AS ts
ON mt.id_truth_type=ts.id_truth_type
JOIN dp02_dc2_catalogs.Object AS obj
ON mt.match_objectId=obj.objectId
WHERE obj.objectId=1486698050427598336
AND ts.truth_type=1
AND obj.detect_isPrimary=1
ORDER BY obj_objectId DESC

Individual objects

In the above example, a single object was desired, and a statement like WHERE objectId=1486 was used. However, if more than a few single objects are desired and their objectId are known, then you can use WHERE objectId IN (1487, 1488, 1489), for example, to return results for all of the objects in a single query.

Below, a list of 12 objectId values is put in a string called my_list. This list could contain many more objects and be generated programmatically (e.g., from a different query, or by user analysis), and then be included in the ADQL query statement and the TAP service would treat it the same way. The number of results returned will equal the number of matched objectIds.

For this example, the 12 were selected to be bright stars with similar g-r and i-z colors, so the query retrieves the g, r, i, and z band fluxes, but users should modify this to their own needs.

from lsst.rsp import get_tap_service, retrieve_query
service = get_tap_service()

my_list = "(1249537790362809267, 1252528461990360512, 1248772530269893180, "\
          "1251728017525343554, 1251710425339299404, 1250030371572068167, "\
          "1253443255664678173, 1251807182362538413, 1252607626827575504, "\
          "1249784080967440401, 1253065023664713612, 1325835101237446771)"

query = "SELECT objectId, g_calibFlux, r_calibFlux, i_calibFlux, z_calibFlux "\
        "FROM dp02_dc2_catalogs.Object "\
        "WHERE objectId IN "+my_list

results = service.search(query)
results.to_table()