DP0.3 Table Access and Queries#
The best way to learn about accessing and querying the DP0.3 tables is to work through the set of DP0.3 tutorials.
The DP0.3 tables are available via the Table Access Protocol (TAP) service in the Rubin Science Platform. TAP provides standardized access to the catalog data for discovery, search, and retrieval. Full documentation for TAP is provided by the International Virtual Observatory Alliance (IVOA).
The TAP service uses a query language similar to SQL (Structured Query Language) called the Astronomical Data Query Language (ADQL). The language is used by the IVOA to represent astronomy queries posted to Virtual Observatory (VO) services (such as TAP). The documentation for ADQL includes more information about syntax and keywords.
Note that not all ADQL functionality is supported by the RSP for Data Preview 0. TAP and ADQL can be used in both the Notebook and Portal aspects.
Important
If a query takes longer than you expect, please submit a GitHub Issue or post in the “Support - Data Preview 0” category of the Rubin Community Forum. Rubin staff are happy to investigate and to help tweak queries for optimal execution.
Table sizes#
Within a given simulated data set (1-year or 10-year), the DiaSource
and SSSource
tables are the same size, and each contains ~650 million rows:
one for every detection of every moving object in the SSObject
table.
The SSObject
table has ~2.0 million (~4.4 million) rows in the 1-year (10-year) DP0.3 dataset. The MPCORB
table has ~14.5 million rows, regardless of whether the 1-year or 10-year data set is used.
The SSObject
table is a subset of all objects in MPCORB
which were detected by LSST in the simulation.
Column summary values#
ADQL has functions that can return quantitative properties of the columns. The following ADQL functionality (at least) is available with the RSP TAP Service:
SELECT COUNT(numObs), MIN(numObs), MAX(numObs), AVG(numObs), SUM(numObs)
FROM dp03_catalogs_1yr.SSObject
Semi-Major Axis#
The orbital element of semi-major axis (a
) is not pre-computed in the MPCORB
table because it can be derived from
the orbit’s ellipticiy (e
) and perihelion distance (q
), as \(a = q /(1 - e)\).
For example, to get eccentricity, perihelion distance, and semi-major axis for any (not necessarily random) 1000 rows, use the ADQL statement below.
SELECT TOP 1000 e, q, q/(1-e) AS a
FROM dp03_catalogs_10yr.MPCORB
Unpopulated columns#
DP0.3 has been simulated and provided on a best-effort basis. There are at present a number of unpopulated columns in the DP0.3 tables, as listed here:
DiaSource
: ra_dec_Cov
SSSource
: mpcUniqueId
, predictedDecErr
, predictedMagnitude
, predictedMagnitudeErr
, predictedRaDecCov
, predictedRaErr
, residualDec
, residualRa
SSObject
: maxExtendedness
, medianExtendedness
, minExtendedness
, MOID
, MOIDDeltaV
, MOIDEclipticLongitude
, MOIDTrueAnomaly
, u_Chi2
, u_G12
, u_G12Err
, u_H
, u_H_uG12_Cov
, u_HErr
, u_Ndata
, y_Chi2
, y_G12
, y_G12Err
, y_H
, y_H_yG12_Cov
, y_HErr
, y_Ndata
MPCORB
: arc
, arcEnd
, arcStart
, computer
, flags
, lastIncludedObservation
, mpcNumber
, n
, nobs
, nopp
, pertsLong
, pertsShort
, reference
, rms
, uncertaintyParameter
These columns may be updated in the future to fill in their values.
Table joins#
The DiaSource
and SSSource
tables are 1:1 and can be joined on the diaSourceId
column.
All rows* of the SSObject
table have a match with MPCORB
(but not vice versa),
and the two tables can be joined on the ssObjectId
column.
*There are exactly 4 rows of the SSObject
table which do not have a match in MPCORB
.
The DiaSource
and SSSource
tables are N:1 with both the SSObject
and MPCORB
tables.
They can be joined on the ssObjectId
column, but caution and testing should be used here.
The N:1 nature of these joins means that the data retrieved can contain columns of repeated values,
be larger than expected, and take a long time to execute.
Query a list of objects#
LSST Query Services (Qserv) do not support subqueries. Thus, using subqueries is not recommended although DP0.3 is not hosted on Qserv.
Instead, when having a list of objects in hand either from a previous query or a user-provided catalog,
the list, formatted as a python tuple, can be passed to a new query for table joins.
The example query below is to retrieve information about individual observations from the DiaSource
and SSSource
tables for indivdual unique objects selected from the SSObject
table and stored
in sId_list
from a previous query.
The example uses only three objects, but the list can be relatively long (verified up to 50,000).
from lsst.rsp import get_tap_service, retrieve_query
service = get_tap_service("ssotap")
sId_list = [-9222537907249304995, -9222483995821535577, -9221971933016733299]
query = """SELECT dia.ssObjectId, dia.diaSourceId, dia.mag,
dia.magErr, dia.band, dia.midPointMjdTai,
sss.phaseAngle, sss.topocentricDist, sss.heliocentricDist
FROM dp03_catalogs_10yr.DiaSource as dia
INNER JOIN dp03_catalogs_10yr.SSSource as sss
ON dia.diaSourceId = sss.diaSourceId
WHERE dia.ssObjectId
IN {}
""".format(tuple(sId_list))
results = service.search(query).to_table()
This returns a results
table with 1915 rows; each of three unique objects has 597, 572, and 746 rows, respectively.
Non-random subsets#
When exploring, if a small but not necessarily random subset of objects is all you need,
use the SELECT TOP
and provide a small number, like 100.
SELECT TOP 100 * FROM dp03_catalogs_1yr.SSObject
Random subsets#
Due to how the DP0.3 tables are stored, retrieving the first N objects that meet a query’s constraints might not be a truly random subset.
To retrieve a random subset, make use of the fact that the ssObjectId
column is a
randomly assigned 64-bit long unsigned integer.
Since ADQL interprets a 64-bit long unsigned integer as a 63-bit signed integer,
these range from about -922e16 to 922e16, but this will be fixed in the future so
that all identifiers are positive numbers.
Until then, for example, to retrieve the griz absolute magnitudes (H
)
for ~24,000 random SSObjects
, use:
SELECT g_H, r_H, i_H, z_H
FROM dp03_catalogs_1yr.SSObject
WHERE ssObjectId > 9000000000000000000
Flagged rows#
The process to derive absolute magnitudes (H
) with phase curve fits produces failure flags.
These are bitwise flags, so that the combinations of multiple flags are unique.
They are stored in the flags
column of the SSObject
table.
Value |
Meaning |
---|---|
0 |
Success! |
1 |
Orbit fitting failure: the |
2 |
\(H_u\) fit failure: the u-band absolute magnitude fit failed due to poor phase coverage or not enough data. Note however that u-band detections are not included in DP0.3. |
4 |
\(H_g\) fit failure: the g-band absolute magnitude fit failed due to poor phase coverage or not enough data. |
8 |
\(H_r\) fit failure: the r-band absolute magnitude fit failed due to poor phase coverage or not enough data. |
16 |
\(H_i\) fit failure: the i-band absolute magnitude fit failed due to poor phase coverage or not enough data. |
32 |
\(H_z\) fit failure: the z-band absolute magnitude fit failed due to poor phase coverage or not enough data. |
64 |
\(H_y\) fit failure: the y-band absolute magnitude fit failed due to poor phase coverage or not enough data. Note however that y-band detections are not included in DP0.3. |
2048 |
Linking failure: the detections in |
Note that the linking failure flag will only exist for simulated objects,
as a real object that is not linked will not be in the SSObject
table.
Example: an object whose absolute magnitude fit failed in the g and r bands will have a flags
value of 12 (in binary, 1100).
Truth data#
Truth information is embedded within the DP0.3 DiaSource
tables in the following four columns: raTrue
(true RA i.e., without simulated measurement noise), decTrue
(true Dec i.e., without simulated measurement noise), magTrueVband
(true magnitude in the V band i.e., without simulated measurement noise), and nameTrue
.
Regarding nameTrue
: a value starting with ‘S’ or ‘CEN’ indicates that the source is a simulated (“fake”) minor body. Otherwise, nameTrue
provides the designation of the relevant real minor body.
The MPCORB
tables contain injected rather than measured orbital parameters, so in this sense the MPCORB tables can be thought of as “truth tables”.