DP0.3 Table Access and Queries#

The best way to learn about accessing and querying the DP0.3 tables is to work through the set of DP0.3 tutorials.

The DP0.3 tables are available via the Table Access Protocol (TAP) service in the Rubin Science Platform. TAP provides standardized access to the catalog data for discovery, search, and retrieval. Full documentation for TAP is provided by the International Virtual Observatory Alliance (IVOA).

The TAP service uses a query language similar to SQL (Structured Query Language) called the Astronomical Data Query Language (ADQL). The language is used by the IVOA to represent astronomy queries posted to Virtual Observatory (VO) services (such as TAP). The documentation for ADQL includes more information about syntax and keywords.

Note that not all ADQL functionality is supported by the RSP for Data Preview 0. TAP and ADQL can be used in both the Notebook and Portal aspects.

Important

If a query takes longer than you expect, please submit a GitHub Issue or post in the “Support - Data Preview 0” category of the Rubin Community Forum. Rubin staff are happy to investigate and to help tweak queries for optimal execution.

Table sizes#

Within a given simulated data set (1-year or 10-year), the DiaSource and SSSource tables are the same size, and each contains ~650 million rows: one for every detection of every moving object in the SSObject table.

The SSObject table has ~2.0 million (~4.4 million) rows in the 1-year (10-year) DP0.3 dataset. The MPCORB table has ~14.5 million rows, regardless of whether the 1-year or 10-year data set is used. The SSObject table is a subset of all objects in MPCORB which were detected by LSST in the simulation.

Column summary values#

ADQL has functions that can return quantitative properties of the columns. The following ADQL functionality (at least) is available with the RSP TAP Service:

SELECT COUNT(numObs), MIN(numObs), MAX(numObs), AVG(numObs), SUM(numObs)
FROM dp03_catalogs_1yr.SSObject

Semi-Major Axis#

The orbital element of semi-major axis (a) is not pre-computed in the MPCORB table because it can be derived from the orbit’s ellipticiy (e) and perihelion distance (q), as \(a = q /(1 - e)\).

For example, to get eccentricity, perihelion distance, and semi-major axis for any (not necessarily random) 1000 rows, use the ADQL statement below.

SELECT TOP 1000 e, q, q/(1-e) AS a
FROM dp03_catalogs_10yr.MPCORB

Unpopulated columns#

DP0.3 has been simulated and provided on a best-effort basis. There are at present a number of unpopulated columns in the DP0.3 tables, as listed here:

DiaSource : ra_dec_Cov

SSSource : mpcUniqueId, predictedDecErr, predictedMagnitude, predictedMagnitudeErr, predictedRaDecCov, predictedRaErr, residualDec, residualRa

SSObject : maxExtendedness, medianExtendedness, minExtendedness, MOID, MOIDDeltaV, MOIDEclipticLongitude, MOIDTrueAnomaly, u_Chi2, u_G12, u_G12Err, u_H, u_H_uG12_Cov, u_HErr, u_Ndata, y_Chi2, y_G12, y_G12Err, y_H, y_H_yG12_Cov, y_HErr, y_Ndata

MPCORB : arc, arcEnd, arcStart, computer, flags, lastIncludedObservation, mpcNumber, n, nobs, nopp, pertsLong, pertsShort, reference, rms, uncertaintyParameter

These columns may be updated in the future to fill in their values.

Table joins#

The DiaSource and SSSource tables are 1:1 and can be joined on the diaSourceId column.

All rows* of the SSObject table have a match with MPCORB (but not vice versa), and the two tables can be joined on the ssObjectId column.

*There are exactly 4 rows of the SSObject table which do not have a match in MPCORB.

The DiaSource and SSSource tables are N:1 with both the SSObject and MPCORB tables. They can be joined on the ssObjectId column, but caution and testing should be used here. The N:1 nature of these joins means that the data retrieved can contain columns of repeated values, be larger than expected, and take a long time to execute.

Query a list of objects#

LSST Query Services (Qserv) do not support subqueries. Thus, using subqueries is not recommended although DP0.3 is not hosted on Qserv.

Instead, when having a list of objects in hand either from a previous query or a user-provided catalog, the list, formatted as a python tuple, can be passed to a new query for table joins. The example query below is to retrieve information about individual observations from the DiaSource and SSSource tables for indivdual unique objects selected from the SSObject table and stored in sId_list from a previous query.

The example uses only three objects, but the list can be relatively long (verified up to 50,000).

from lsst.rsp import get_tap_service, retrieve_query
service = get_tap_service("ssotap")

sId_list = [-9222537907249304995, -9222483995821535577, -9221971933016733299]

query = """SELECT dia.ssObjectId, dia.diaSourceId, dia.mag,
dia.magErr, dia.band, dia.midPointMjdTai,
sss.phaseAngle, sss.topocentricDist, sss.heliocentricDist
FROM dp03_catalogs_10yr.DiaSource as dia
INNER JOIN dp03_catalogs_10yr.SSSource as sss
ON dia.diaSourceId = sss.diaSourceId
WHERE dia.ssObjectId
IN {}
""".format(tuple(sId_list))

results = service.search(query).to_table()

This returns a results table with 1915 rows; each of three unique objects has 597, 572, and 746 rows, respectively.

Non-random subsets#

When exploring, if a small but not necessarily random subset of objects is all you need, use the SELECT TOP and provide a small number, like 100.

SELECT TOP 100 * FROM dp03_catalogs_1yr.SSObject

Random subsets#

Due to how the DP0.3 tables are stored, retrieving the first N objects that meet a query’s constraints might not be a truly random subset.

To retrieve a random subset, make use of the fact that the ssObjectId column is a randomly assigned 64-bit long unsigned integer. Since ADQL interprets a 64-bit long unsigned integer as a 63-bit signed integer, these range from about -922e16 to 922e16, but this will be fixed in the future so that all identifiers are positive numbers. Until then, for example, to retrieve the griz absolute magnitudes (H) for ~24,000 random SSObjects, use:

SELECT g_H, r_H, i_H, z_H
FROM dp03_catalogs_1yr.SSObject
WHERE ssObjectId > 9000000000000000000

Flagged rows#

The process to derive absolute magnitudes (H) with phase curve fits produces failure flags. These are bitwise flags, so that the combinations of multiple flags are unique. They are stored in the flags column of the SSObject table.

SSObject flags column.#

Value

Meaning

0

Success!

1

Orbit fitting failure: the diaSource detections do not fit a sensible orbit for a moving object (e.g., they have an unusually high chi-squared per degree of freedom).

2

\(H_u\) fit failure: the u-band absolute magnitude fit failed due to poor phase coverage or not enough data. Note however that u-band detections are not included in DP0.3.

4

\(H_g\) fit failure: the g-band absolute magnitude fit failed due to poor phase coverage or not enough data.

8

\(H_r\) fit failure: the r-band absolute magnitude fit failed due to poor phase coverage or not enough data.

16

\(H_i\) fit failure: the i-band absolute magnitude fit failed due to poor phase coverage or not enough data.

32

\(H_z\) fit failure: the z-band absolute magnitude fit failed due to poor phase coverage or not enough data.

64

\(H_y\) fit failure: the y-band absolute magnitude fit failed due to poor phase coverage or not enough data. Note however that y-band detections are not included in DP0.3.

2048

Linking failure: the detections in diaSource were not successfully linked.


Note that the linking failure flag will only exist for simulated objects, as a real object that is not linked will not be in the SSObject table.

Example: an object whose absolute magnitude fit failed in the g and r bands will have a flags value of 12 (in binary, 1100).

Truth data#

Truth information is embedded within the DP0.3 DiaSource tables in the following four columns: raTrue (true RA i.e., without simulated measurement noise), decTrue (true Dec i.e., without simulated measurement noise), magTrueVband (true magnitude in the V band i.e., without simulated measurement noise), and nameTrue.

Regarding nameTrue: a value starting with ‘S’ or ‘CEN’ indicates that the source is a simulated (“fake”) minor body. Otherwise, nameTrue provides the designation of the relevant real minor body.

The MPCORB tables contain injected rather than measured orbital parameters, so in this sense the MPCORB tables can be thought of as “truth tables”.