Interactive Catalog Visualization¶
Contact author(s): Leanne Guy
Last verified to run: 2024-08-19
LSST Science Pipelines version: Weekly 2024_32
Container Size: large
Targeted learning level: intermediate
Description: Interactive catalog data visualizations with three open-source python libraries.
Skills: Create linked interactive plots for large datasets, and output interactive plots to interactive HTML files. Use Bokeh, HoloViews, and Datashader.
LSST Data Products: Object catalog.
Packages: bokeh, holoviews, datashader
Credit: This tutorial was inspired by a notebook originally developed by Keith Bechtol in the context of the LSST Stack Club. It has been updated and extended for DP0.2 by Leanne Guy.
Get Support: Find DP0-related documentation and resources at dp0.lsst.io. Questions are welcome as new topics in the Support - Data Preview 0 Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.
1. Introduction¶
The Rubin Science Platform was designed to enable scientific analysis of the LSST data sets, which will be unprecedentedly large and complex. The software and techniques that are best suited for visualizing large data sets might be new to many astronomers. This notebook introduces learners with some knowledge of python to three open-source Python libraries that enable powerful interactive visualization of catalogs.
- HoloViews: Produce high-quality interactive visualizations easily by annotating plots and images rather than using direct calls to a plotting library.
- Bokeh: A powerful data visualization library that provides interactive tools including brushing and linking between multiple plots.
- Datashader: Accurately render very large datasets quickly and flexibly.
These packages are part of the Holoviz ecosystem of tools intended for visualization in a web browser and can be used to create quite sophisticated dashboard-like interactive displays and widgets. The goal of this tutorial is to provide an introduction and starting point from which to create more advanced, custom interactive visualizations. Holoviz has a vibrant and active community where you can ask questions and discuss visualizations with a global community.
Notice: If the notebook or any interactive features seem to stall, first try going a few cells back and rerunning them in order (the order in which cells are run is imporant for this notebook's functionality). If that does not work, try restarting the kernel. If issues persist, try logging out and restarting the Notebook aspect using a "large" instance of the JupyterLab environment.
Warning: It is not recommended to "Restart Kernel and Run All Cells" in this notebook, or to execute multiple cells very quickly. Some of the examples require interaction (e.g., for the user to select points on a graph) in order to run correctly, and going too fast can cause some plots to not display properly.
1.1. Import packages¶
Import general scientific python packages (os
, numpy
, pandas
),
functions from the astronomy python package astropy
,
the Rubin function for accessing the TAP service (lsst.rsp.get_tap_service
),
and various functions from the bokeh
, holoviews
, and datashader
packages
that are used in this tutorial.
Notice: It is OK to ignore the warning that appears in pink below the following cell about upgrading
pyarrow
.
import os
import numpy as np
import pandas as pd
from astropy import units as u
from astropy.coordinates import SkyCoord
from astropy.units.quantity import Quantity
from lsst.rsp import get_tap_service
import bokeh
from bokeh.io import output_notebook, show, output_file, reset_output
from bokeh.models import ColumnDataSource, Range1d, HoverTool
from bokeh.models import CDSView, GroupFilter
from bokeh.plotting import figure, gridplot
from bokeh.transform import factor_cmap
import holoviews as hv
from holoviews import streams, opts
from holoviews.operation.datashader import datashade, dynspread
from holoviews.plotting.util import process_cmap
import datashader as dsh
Show which versions of Bokeh, HoloViews, and Datashader we are working with. This is important when referring to online documentation as APIs can change between versions.
print("Bokeh version: " + bokeh.__version__)
print("HoloViews version: " + hv.__version__)
print("Datashader version: " + dsh.__version__)
Bokeh version: 3.4.2 HoloViews version: 1.19.1 Datashader version: 0.16.3
1.2. Define functions and parameters¶
Update the maximum number of display rows for Pandas tables.
pd.set_option('display.max_rows', 5)
Set the display of the output Bokeh plots to be inline, in the notebook.
Set the HoloViews plotting library to be bokeh.
The HoloViews and Bokeh icons are displayed when the library is loaded successfully.
Always set the extension after executing output_notebook()
to avoid issues with plot display.
hv.extension('bokeh')
Notice: Sometimes the
bokeh.io.show
function can be finicky when output modes are switched (e.g., from inline to an HTML file and back again).
To avert a "Models must be owned by only a single document" error (see, e.g., https://github.com/bokeh/bokeh/issues/8579), define the following two functions and use them in Section 4.
def show_bokeh_inline(p):
try:
reset_output()
output_notebook()
show(p)
except:
output_notebook()
show(p)
def show_bokeh_to_file(p, outputFile):
try:
reset_output()
output_file(outputFile)
show(p)
except:
output_file(outputFile)
show(p)
2. Use the TAP service to obtain table data¶
The basis for any data visualization is the underlying data. This tutorial works with tabular data that is retrieved from a cone search around a defined coordinate with a specified radius using the Rubin TAP service.
Get a Rubin TAP service instance.
service = get_tap_service("tap")
assert service is not None
Define a reference position on the sky and a radius in degrees for a cone search.
coord = SkyCoord(ra=55.65*u.degree, dec=-40.*u.degree, frame='icrs')
radius = 1 * u.deg
Define the query pass to the TAP service.
query = "SELECT coord_ra, coord_dec, objectId, r_extendedness, "\
"scisql_nanojanskyToAbMag(g_cModelFlux) AS mag_g_cModel, "\
"scisql_nanojanskyToAbMag(r_cModelFlux) AS mag_r_cModel, "\
"scisql_nanojanskyToAbMag(i_cModelFlux) AS mag_i_cModel "\
"FROM dp02_dc2_catalogs.Object "\
"WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec),"\
"CIRCLE('ICRS', " + str(coord.ra.value) + ", " + str(coord.dec.value) + ", " \
+ str(radius.to(u.deg).value) + " )) = 1 " \
"AND detect_isPrimary = 1 "\
"AND scisql_nanojanskyToAbMag(r_cModelFlux) < 27.0 "\
"AND r_extendedness IS NOT NULL"
print(query)
SELECT coord_ra, coord_dec, objectId, r_extendedness, scisql_nanojanskyToAbMag(g_cModelFlux) AS mag_g_cModel, scisql_nanojanskyToAbMag(r_cModelFlux) AS mag_r_cModel, scisql_nanojanskyToAbMag(i_cModelFlux) AS mag_i_cModel FROM dp02_dc2_catalogs.Object WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec),CIRCLE('ICRS', 55.65, -40.0, 1.0 )) = 1 AND detect_isPrimary = 1 AND scisql_nanojanskyToAbMag(r_cModelFlux) < 27.0 AND r_extendedness IS NOT NULL
As query will return a very large dataset, use an asynchronous query. This will take a few minutes.
job = service.submit_job(query)
job.run()
<pyvo.dal.tap.AsyncTAPJob at 0x79a7933a7d50>
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
Job phase is COMPLETED
After the job phase has completed, fetch the results into a Pandas table.
data = job.fetch_result().to_table().to_pandas()
Confirm that the expected number of rows has been returned (1345701). If true, the following cell will not return an assertion error.
assert len(data) == 1345701
Compute three colors from the apparent magnitudes.
data['gmi'] = data['mag_g_cModel'] - data['mag_i_cModel']
data['rmi'] = data['mag_r_cModel'] - data['mag_i_cModel']
data['gmr'] = data['mag_g_cModel'] - data['mag_r_cModel']
Use the r-band extendedness parameter to classify objects as having a "shape_type" of "point" or "extended".
data['shape_type'] = data['r_extendedness'].map({0: 'point', 1: 'extended'})
Convert the objectId
to a string because as an integer, it's too large for bokeh to handle.
data['objectId'] = np.array(data['objectId']).astype('str')
Option to print the number of objects with shape_type
as "point" or "extended".
# tx1 = np.where(data['shape_type'] == 'point')[0]
# tx2 = np.where(data['shape_type'] == 'extended')[0]
# print(len(tx1), len(tx2))
# del tx1, tx2
Confirm that the expected number of each shape_type
have been returned.
assert data[data["shape_type"] == "point"].shape[0] == 232835
assert data[data["shape_type"] == "extended"].shape[0] == 1112866
3. HoloViews¶
HoloViews supports easy analysis and visualization by annotating data rather than utilizing direct calls to plotting packages. This tutorial uses Bokeh as the plotting library backend for HoloViews. HoloViews supports several plotting libraries, and exercise 1 in Section 6 is to explore using HoloViews with other plotting packages.
Create a 2% random subsample of this dataset with which to demonstrate some basic HoloViews functionality. Print the length of this subset and confirm that it is 2% of the original size.
frac = 0.02
data20K = data.sample(frac=frac, axis='index')
print(len(data20K))
assert len(data20K) == round(frac * len(data))
26914
3.1. Single plots¶
The basic core primitives of HoloViews are Elements (hv.Element
).
Elements are simple wrappers for data which provide a semantically
meaningful visual representation.
An Element may be a set of Points, an Image, a Curve, a Histogram, etc.
See the HoloViews Reference Gallery
for all the various types of Elements that can be created with HoloViews.
The example in this section uses the HoloViews Scatter Element
to quickly visualize the catalog data as a scatter plot.
HoloViews maintains a strict separation between content and presentation.
This separation is achieved by maintaining sets of keyword values as
options
that specify how Elements
are to appear.
This example applies the default options and removes the toolbar.
Make a simple scatter plot of the data using the Scatter element.
hv.Scatter(data20K).options(toolbar=None)
Figure 1: A non-interactive plot of
coord_ra
vs.coord_dec
appears as a blue circle composed of individual, but mostly blended, blue dots.
The data20K
set contains many columns. If no columns are specified explicitly, the first 2 columns are taken for x and y respectively by the Scatter Element.
Now bin the data in RA using the robust
Freedman Diaconis Estimator,
plot the resulting distribution using the HoloViews
Histogram Element,
and add in some basic plot options.
Read more about about customizing plots via options
. Note that options
can be shortened to opts
.
(ra_bin, count) = np.histogram(data20K['coord_ra'], bins='fd')
ra_distribution = hv.Histogram((ra_bin, count)).opts(
title="RA distribution", color='darkmagenta',
xlabel='RA', fontscale=1.2,
height=400, width=400)
ra_distribution
Figure 2: A histogram (bar chart in purple) of the number of objects in a given RA bin in the subset of twenty thousand objects. This plot is interactive and displays a tool bar at right for the user to, e.g., zoom in on the plot.
3.2. Layouts of unlinked plots¶
Create a layout of several plots.
A Layout
is a type of Container
that can contain any HoloViews object.
Other types of Containers that exist include Overlay
, Gridspace
, Dynamicmap
, etc.
See the HoloViews Reference Gallery
for the full list of Layouts
that can be created with HoloViews.
See Building Composite Objects
for the full details about the ways Containers
can be composed.
Slice the data and set some more options, and then construct a layout using the +
operator.
skyplot = hv.Scatter(data20K[["coord_ra", "coord_dec"]]).opts(
title="Skyplot", toolbar='above', tools=['hover'],
height=350, width=350, alpha=0.2, size=2)
skyplots = skyplot + ra_distribution.options(height=350, width=350)
skyplots
Figure 3: Two side-by-side plots, Fig 1 from above on the left, and Fig 2 from above on the right, with the interactive toolbar at upper right.
Note that these two plots above are not linked, they are two independent plots laid out next to each other.
Zoom in on the skyplot and notice that the data are not rebinned in the RA distribution plot. Linking plots is demonstrated below.
The tools, however, do apply to both plots. Try modifying both plots and then use the "reset" tool (the circular arrow symbol). Notice that both plots are reset to their original states.
3.3. Layouts of linked plots¶
Set up some default plot options to avoid duplicating long lists for every new plot.
Different plotting packages typically provide different customization capabilities. Below, one set of options is defined for a Bokeh backend, and one for a matplotlib backend.
Set Bokeh customizations as a python dictionary.
plot_style_bkh = dict(alpha=0.5, color='darkmagenta',
marker='triangle', size=3,
xticks=5, yticks=5,
height=400, width=400,
toolbar='above')
Set matplotlib customizations.
plot_style_mpl = dict(alpha=0.2, color='c', marker='s',
fig_size=200, s=2,
fontsize=14, xticks=8, yticks=8)
Choose to use the Bokeh plot style.
plot_style = plot_style_bkh
Instead of subsetting a dataset to choose which columns to plot, HoloViews allows the user to specify the dimensionality directly.
kdims
are the key dimensions or the independent variable(s) and
vdims
are the value dimensions or the dependenent variable(s).
hv.Scatter(data20K,
kdims=['gmi'], vdims=['mag_g_cModel']
).opts(invert_yaxis=True,
xlabel=" (g-i)", ylabel="g",
**plot_style)
Figure 4: The g-band magnitude as a function of g-i color, with points as small purple triangles. The interactive toolbar is at upper right.
The dimensions have be specified as strings above, but they are in fact rich objects. Dimension objects support a long descriptive label, which complements the short programmer-friendly name.
Below, create a color-color diagram of the point-like sources in the dataset, and also display the distribution of samples along both value dimensions using the hist()
method of the Scatter Element.
Set the axes as rich objects.
rmi = hv.Dimension('rmi', label='(r-i)', range=(-0.8, 3.0))
gmr = hv.Dimension('gmr', label='(g-r)', range=(-0.8, 3.0))
Identify the point-like objects as "points" and create the scatter plot.
points = data20K[data20K["shape_type"] == 'point']
col_col = hv.Scatter(points, kdims=gmr,
vdims=rmi).opts(**plot_style)
Use the hist method to show the distribution of samples along both value dimensions.
col_col = col_col.hist(dimension=[gmr, rmi],
num_bins=100, adjoin=True)
col_col
Figure 5: The r-i color versus the g-r color (purple triangles), with histograms (blue bar charts) of the color distributions above and to the right. The interactive tool bar is at upper right.
Try zooming in on regions of the plot. The histograms are automatically recomputed.
3.4. Magics¶
The techniques to apply customizations in the cells above use standard Python syntax and are the recommended way to customize visualizations in HoloViews. HoloViews also supports IPython magic commands. Magics are a much older approach that is not standard Python and is specific to notebooks. HoloViews notebook magic supports both line and cell magics. Here is an example of using magics to plot the same spatial distribution of Objects as above.
%%opts Scatter [tools=['hover'], toolbar='above',height=400, width=400](color='grey')
hv.Scatter(data20K).opts(alpha=0.3, size=3)
Figure 6: Same as Fig 1, but plotted with grey points and with an interactive toolbar at upper right.
Hover the mouse over the plot above and see that many (too many) columns are included for the hover tool to display. It is more user-friendly to be selective about which information is shown with the hover tool.
Create a custom hover tool with select columns: the RA and Dec, the r-band
magnitude, and the shape_type
.
raDecHover = HoverTool(
tooltips=[
('ra,dec', '@coord_ra / @coord_dec'),
('rmag', '@mag_r_cModel'),
('type', '@shape_type'),
],
formatters={
'ra/dec': 'printf',
'rmag': 'numeral',
'type': 'printf',
},
point_policy="follow_mouse"
)
Display the plot.
hv.Scatter(data20K).opts(tools=[raDecHover], **plot_style_bkh)
Figure 7: Same as Fig 6, but plotted with purple points.
Hover the mouse over the plot above and see only the selected columns' data are displayed.
It might be necessary to zoom in until points are distinguishable in order to notice how the hover tool box contents list only three properties per point.
4. Bokeh¶
A very useful feature of Bokeh is the ability to add connected interactivity between plots that show different attributes of the same data. This is called linking.
With linked plots it is possible to carry out data brushing, whereby data can be selected and manipulated synchronously across multiple linked plots.
For example, if a skyplot is linked with a colour-magnitude diagram of the same dataset, it becomes possible to interactively explore the relationship between the positions of objects in each plot.
This section uses the Bokeh plotting library to demonstrate how to set up brushing and linking between two panels showing different representations of the same dataset. A selection applied to either panel will highlight the selected points in the other panel.
This section is based on Bokeh linked brushing.
4.1. Data preparation¶
Getting the data preparation phase right is key to creating powerful visualizations. Bokeh works with a ColumnDataSource (CDS). A CDS is essentially a collection of sequences of data that have their own unique column name.
The CDS is the core of bokeh plots. Bokeh automatically creates a CDS from data passed as python lists or numpy arrays. CDS are useful as they allow data to be shared between multiple plots and renderers, enabling brushing and linking.
Below, a CDS is created from the data returned by the query above and passed directly to Bokeh.
Create a CDS for the plots to share.
The data defined as x0
, y0
, x1
, y1
will be used to plot the left and right plots, respectively.
col_data = dict(x0=data20K['coord_ra'] - coord.ra.value,
y0=data20K['coord_dec'] - coord.dec.value,
x1=data20K['gmi'],
y1=data20K['mag_g_cModel'],
ra=data20K['coord_ra'], dec=data20K['coord_dec'])
source = ColumnDataSource(data=col_data)
Additional data can be added to the CDS after creation.
source.data['objectId'] = data20K['objectId']
source.data['rmi'] = data20K['rmi']
source.data['gmr'] = data20K['gmr']
source.data['mag_r_cModel'] = data20K['mag_r_cModel']
source.data['shape_type'] = data20K['shape_type']
source.data['r_extendedness'] = data20K['r_extendedness']
Create a "points" view with a filter that requires "shape_type" to be "point".
points = CDSView()
points.filter = GroupFilter(column_name='shape_type', group="point")
4.2. Linked plots with data brushing¶
Use Bokeh to plot a color-magnitude (g vs. g-i) diagram and a plot of the sky coordinates, and then link them.
Create a custom hover tool for each panel.
hover_left = HoverTool(tooltips=[("ObjectId", "@objectId"),
("(ra,dec)", "(@ra, @dec)"),
("type", "@shape_type")
])
hover_right = HoverTool(tooltips=[("ObjectId", "@objectId"),
("(g-i,g)", "(@x1, @y1)"),
("extendedness", "@r_extendedness")
])
tools = "box_zoom,box_select,lasso_select,reset,help"
tools_left = [hover_left, tools]
tools_right = [hover_right, tools]
Create a new two-panel plot and add a renderer. Use the "points" view defined above.
Create the left-side plot.
left = figure(tools=tools_left,
width=400, height=400,
title='Spatial: Centered on (RA, Dec) = (%.2f, %.2f)' %
(coord.ra.value, coord.dec.value))
left.scatter('x0', 'y0', hover_color='firebrick',
size=3, alpha=0.7,
source=source,
view=points)
left.x_range = Range1d(1.5, -1.5)
left.y_range = Range1d(-1.5, 1.5)
left.xaxis.axis_label = 'Delta ra'
left.yaxis.axis_label = 'Delta dec'
Create the right-side plot.
right = figure(tools=tools_right, width=400, height=400,
title='CMD')
right.scatter('x1', 'y1', hover_color='firebrick',
size=4, alpha=0.8,
source=source,
view=points)
right.x_range = Range1d(-1.5, 2.8)
right.y_range = Range1d(32., 16.)
right.xaxis.axis_label = '(g-i)'
right.yaxis.axis_label = 'g'
Display the grid of plots. This can take a moment to render.
Figure 8: At left, the offset from the specified coordinate ($\Delta$ Dec vs. $\Delta$ RA), and at right, the g-band magnitude as a function of g-i color. All points are blue circles and the interactive toolbar is at upper right.
Use the hover tool to see information about individual data points (e.g., the "ObjectId"). This information should appear automatically when the mouse is hovered over the data points. Notice the data points highlighted in red on one panel with the hover tool are also highlighted on the other panel.
Next, click on the selection box icon or the selection lasso icon found in the upper right corner of the figure. Use the selection box and selection lasso to make various selections in either panel by clicking and dragging on either panel. The selected data points will be displayed in the other panel.
4.3. Output to interactive HTML file¶
Output this interactive plot to an interactive HTML file.
Define the output file name to store it in the home directory ~/.
.
outputDir = os.path.expanduser('~')
outputFileName = 'nb06b_plot1.html'
outputFile = os.path.join(outputDir, outputFileName)
print('The full pathname of the interactive HTML file will be '+outputFile)
The full pathname of the interactive HTML file will be /home/melissagraham/nb06b_plot1.html
Use the bokeh.io.show
method, embedded in the show_bokeh_to_file
function
defined in Section 1.2 above, to output the interactive HTML file.
show_bokeh_to_file(p, outputFile)
To view the interactive HTML file navigate to it via the directory listing in the left-hand side panel of this Jupyter notebook graphical interface, and click on its name. This will open another tab containing the HTML file.
Since the file is relatively small (about 3.6 MB) it should load quickly (within a few seconds). Once loading is complete, click on the "Trust HTML" button at the top-left of the tab's window. Then, a near-duplicate of the two linked plots above should be displayed.
It is also possible to download the HTML file, interact with it in a browser as a a local file.
4.4. Linked streams¶
To do subsequent calculations with the set of selected points, it is possible to use HoloViews linked streams for custom interactivity. The following visualization is a modification of this example.
As for the example above, the plots generated below use the selection box and selection lasso to choose data points on the left panel, and then the selected points appear in the right panel.
Notice that as the selection is changed in the left panel, the mean x- and y-values for selected data points are shown in the title of the right panel.
This section is based on HoloViews Selection1D points.
Declare some points, and define a selection for the stream.
points = hv.Points((data20K['coord_ra'] - coord.ra.value,
data20K['coord_dec'] - coord.dec.value)
).options(tools=['box_select', 'lasso_select'])
selection = streams.Selection1D(source=points)
Define a function that uses the selection indices to slice points and compute statistics. This function is defined here (and not in Section 1.2) because it is only used in the subsequent (i.e., it is not a globally defined and used function).
def selected_info(index) -> str:
selected = points.iloc[index]
if index:
label = 'Mean x, y: %.3f, %.3f' % tuple(selected.array().mean(axis=0))
else:
label = 'No selection'
return selected.relabel(label).options(color='red')
Combine "points" and DynamicMap. Notice the syntax used here, how the "+" sign makes side-by-side panels.
points + hv.DynamicMap(selected_info, streams=[selection])
Figure 9: At left, the offset from a specified coordinate (as in Fig 8). The plot at right has the same axes, but will be empty until the action below is taken to select points in the left plot. The interactive toolbar is at upper right.
Use the lasso- or box-select to select a region in the left-hand plot.
Print the number of objects selected.
print(len(selection.index))
1968
Option to list the indices of the selected objects, if the number selected is less than 200.
If desired, these indices could be used to define a population for subsequence analysis.
# if len(selection.index) < 200:
# print(selection.index)
5.0 Datashader¶
The interactive features of Bokeh work well with datasets up to a few tens of thousands of data points. To efficiently explore larger datasets it is recommended to use another visualization model that offers better scalability: Datashader.
The examples below will show how, when zooming in on the datashaded two-dimensional histograms, the bin sizes are dynamically adjusted to show finer or coarser granularity in the distribution. This allows for the interactive exploration of large datasets without having to manually adjust the bin sizes while panning and zooming.
Zoom in all the way to see individual points (i.e., bins contain either zero or one count). Soom in far enough and find that the individual points are represented by extremely small pixels in datashader that are difficult to see. A solution is to dynspread instead of datashade, which will preserve a finite size of the plotted points.
5.1. Plotting thousands of data points¶
Plot a color-colour diagram using Bokeh with a customized hover tool.
plot_options = {'height': 400, 'width': 800,
'tools': ['pan', 'box_zoom', 'box_select',
'wheel_zoom', 'reset', 'help']}
p = figure(title="Color-Color Diagram",
x_axis_label="(g-r)", y_axis_label="(r-i)",
x_range=(-2.0, 3.0), y_range=(-2.0, 3.0),
**plot_options)
p.scatter(x='gmr', y='rmi', source=source,
size=2, alpha=0.2,
hover_color='firebrick',
legend_field="shape_type",
color=factor_cmap('shape_type', 'Category10_3',
['point', 'extended']))
hover = HoverTool(tooltips=[("objectId", "@objectId"),
("(ra,dec)", "(@coord_ra, @coord_dec)"),
("(g-r,r-i)", "(@gmr, @rmi)"),
("type", "@shape_type")])
p.add_tools(hover)
Figure 10: The r-i color vs. the g-r color, with extended objects in orange and point-like objects in blue. The interactive tool bar is at right.
The plot above suffers from overplotting (confusion), even though the dataset only contains ~20K points.
A classic strategy to visualize the dense areas is to specify the transparency (alpha
) of the glyphs;
in the plot above, alpha=0.2
was used.
This has helped, but washes out the detail in the sparser regions.
HoloViews + Datashader allow millions to billions of points to be plotted, and this produces much more informative plots. Datashader rasterizes or aggregates datasets into regular grids that can then be further analysed or viewed as plots or images.
Create a Holoviews object "points" to hold and plot data.
points = hv.Points((source.to_df()['gmr'],source.to_df()['rmi']))
Create the linked streams instance.
boundsxy = (0, 0, 0, 0)
box = streams.BoundsXY(source=points, bounds=boundsxy)
bounds = hv.DynamicMap(lambda bounds: hv.Bounds(bounds), streams=[box])
Apply the datashader
p = dynspread(datashade(points, cmap="Viridis"))
p = p.opts(width=800, height=300, padding=0.05, show_grid=True,
xlim=(-2.0, 7.0), ylim=(-5.0, 3.0), xlabel="(g-r)", ylabel="(r-i)",
tools=['box_select'])
Render the datashaded plot.
p * bounds
Figure 11: The r-i vs. the g-r color as in Fig 10, but displayed as a 2-dimensional density map (with a purple-green-yellow colormap) in regions of high density, and as individual points in regions of low density. Interactive toolbar at right.
This datashaded plot of the same color-color diagram as above does not require any magic-number parameters such as size and alpha and automatically ensures that there is no saturation or overplotting.
Above, select the wheel zoom and adjust the image to interact with the plot. Note how the shades of color of the data points change according to the local density.
5.2. Plotting millions of data points¶
The datasest of ~20K points used above is actually too small to demonstrate the power of datashader.
Below, visualize the full >1 million object dataset returned by the query.
Create a Points Element for the data.
points = hv.Points((data['gmr'], data['rmi']))
Create the linked streams instance.
boundsxy = (0, 0, 0, 0)
box = streams.BoundsXY(source=points, bounds=boundsxy)
bounds = hv.DynamicMap(lambda bounds: hv.Bounds(bounds), streams=[box])
Apply the datashader.
p = dynspread(datashade(points, cmap="Viridis"))
p = p.opts(width=800, height=300, padding=0.05, show_grid=True,
xlim=(-2.0, 7.0), ylim=(-5.0, 3.0), xlabel="(g-r)", ylabel="(r-i)",
tools=['box_select'])
Render the datashaded plot.
p * bounds
Figure 12: Same as Fig 11, but with many more objects included.
Above, use the box zoom or wheel zoom functionality to zoom in on the plot, and watch how datashader adapts the view to optimally display the data.
5.3. Adding a callback function¶
Add callback functionality to the color-color diagam above to retrieve the indices of selected points.
Above, use the box selection tool to select data.
STOP - Select some data points from the plot above using the box select tool before proceeding
Print the number of objects selected.
selection = (points.data.x > box.bounds[0]) \
& (points.data.y > box.bounds[1]) \
& (points.data.x < box.bounds[2]) \
& (points.data.y < box.bounds[3])
print('The selection box contains %i data points'%(np.sum(selection)))
The selection box contains 281453 data points
5.4. Interactive selection¶
Below, create two side-by-side plots.
The left-hand plot will show the datashaded spatial distribution on the sky, and the right-hand plot will be a linked and brushed plot showing the r-band magnitude distribution for objects selected in the left-hand plot. It will be possible to use the box selection in the spatial distribution plot to change which data are included in the histogram.
First, create a holoviews dataset instance, and label some of the columns.
kdims = [('coord_ra', 'RA(deg)'), ('coord_dec', 'dec(deg)')]
vdims = [('mag_r_cModel', 'r(mag)')]
ds = hv.Dataset(data, kdims, vdims)
points = hv.Points(ds)
boundsxy = (np.min(ds.data['coord_ra']), np.min(ds.data['coord_dec']),
np.max(ds.data['coord_ra']), np.max(ds.data['coord_dec']))
box = streams.BoundsXY(source=points, bounds=boundsxy)
box_plot = hv.DynamicMap(lambda bounds: hv.Bounds(bounds), streams=[box])
Create custom callback functionality to update the linked histogram.
These functions are defined here (and not in Section 1.2) because they are only used in the following cell.
def log_inf(x) -> float:
return np.log(x) if x > 0 else 0
def update_histogram(bounds=bounds) -> hv.Histogram:
selection = (ds.data['coord_ra'] > bounds[0]) & \
(ds.data['coord_dec'] > bounds[1]) & \
(ds.data['coord_ra'] < bounds[2]) & \
(ds.data['coord_dec'] < bounds[3])
selected_mag = ds.data.loc[selection]['mag_r_cModel']
frequencies, edges = np.histogram(selected_mag)
hist = hv.Histogram(
(list(map(log_inf, frequencies)), edges)).opts(xlabel='r (mag)')
return hist
dmap = hv.DynamicMap(
update_histogram, streams=[box]).options(height=400, width=400)
datashade(points, cmap=process_cmap("Viridis", provider="bokeh")) * \
box_plot.options(height=400, width=400, tools=['box_select']) + \
dmap
Figure 13: At right, Dec vs. RA as a 2-dimensional density map with a purple-green-yellow colormap. At left, a histogram (blue bar chart) of the fraction of objects in bins of r-band magnitude.
Try changing the box selection and watch as the histogram is recomputed and displayed.
Finally, delete the job
job.delete()
6. Exercises for the learner¶
HoloViews works with a wide range of plotting libraries; Bokeh, matplotlib, plotly, mpld3, pygal to name a few. As an exercise, try changing the HoloViews plotting library to be
matplotlib
instead ofbokeh
at the beginning of the notebook withhv.extension('matplotlib')
. Notice the holoviews + matplotlib icons displayed when the library is loaded successfully. Recreate a few plots and compare the outputs. Try again with some other plotting library. Don't forget to set the plotting library back toBokeh
, which is used for this tutorial. Note that some warnings might be raised.Try writing a different callback function to use in Section 5.3.
As in Section 4.3, try writing other interactive plots from this notebook to an interactive HTML file. Which ones can easily be output to an interactive HTML file? Which HTML files show full interactive functionality? Which ones don't?