Whenever someone says ‘You can do that with an extension’ in the Jupyter ecosystem, it is often not clear what kind of extension they are talking about. The Jupyter ecosystem is very modular and extensible, so there are lots of ways to extend it. This blog post aims to provide a quick summary of the most common ways to extend Jupyter, and links to help you explore the extension ecosystem.
JupyterLab extensions (labextension)
The JupyterLab documentation has information on how to install & use extensions, as well as how to author & distribute them. You can also discover extensions by searching on GitHub or npmjs.com
My favorite JupyterLab extension is jupyterlab-vim — it lets you fully use Vim keybindings inside JupyterLab!
Classic Notebook extensions (nbextension)
The Jupyter Notebook documentation has information on how to install or develop extensions. The Unofficial Jupyter Notebook extensions repository has a lot of popular extensions and a GUI extension manager you can use to install nbextensions.
My favorite nbextension provides a collapsible Table of Contents for your notebooks.
Notebook Server Extensions (serverextension)
Unlike JupyterLab or nbextensions, Jupyter Notebook Server extensions are written in Python to add some serverside functionality. There are two primary use cases for server extensions.
The first use case is to provide a backend for a particular JupyterLab or classic notebook extension. An example is the jupyterlab-latex JupyterLab extension, which provides live previews of LaTeX files in JupyterLab. It has a frontend JupyterLab extension to integrate with the JupyterLab text editor, and a backend serverextension component that actually runs the LaTeX commands to produce the output displayed to you.
The second use case is to provide any user interface backed by any kind of server side processing. Server extensions can function as arbitrary TornadoHTTP handlers — so any web application you can think of, you can write as a Jupyter serverextension. An example is nbgitpuller, which provides UI and mechanisms to distribute notebooks from git repositories to your users in a transparent way.
My favorite here is jupyter-rsession-proxy, which lets you run RStudio in JupyterHub environments!
My favorite kernel is the linux kernel.
If you’ve written
%matplotlib inlinein a notebook, you have used an IPython magic. These are almost like macros for Python — you can write custom code that parses the rest of the line (or cell), and do whatever it is that you want.
Line magics start with one
% symbol and take some action based on the rest of the line. For example,
%cd somedirectory changes the current directory of the python process. Cell magics start with
%% and operate on the entire cell contents after it.
%%timeit is probably the most famous – it’ll run the code a number of times and report stats on how long it takes to run.
You can also build your own magic command that integrates with IPython. For example, the ipython-sql package provides the
%%sql magic command for working seamlessly with databases. However, remember that in contrast to the extensions listed so far, IPython magics only work with the IPython kernel.
My favorite use of IPython magics is this blog post by Matthias Bussonnier, which makes great use of custom magics to seamlessly integrate Python, R, C and Julia in the same notebook.
IPython Widgets (ipywidgets)
IPython Widgets (ipywidgets) provide interactive GUI widgets for Jupyter notebooks and the IPython kernel. They let you and the people you share your notebooks with explore various options in your code with GUI elements rather than having to modify code. Coupled with something like voila, you can make dashboard-like applications for other people to consume without realizing it was created completely with a Jupyter Notebook!
You can build your own custom widgets to provide domain-specific interactive visualizations. For example, you can interactively visualize maps with ipyleaflet, use itk-jupyter-widget to explore image segmentation/registration problems interactively, or model 3D objects with pythreejs.
Check out the vdom project for a more reactive take on the same problem space, and xwidgets for a C++ implementation
Whenever you open or save a notebook or file through the web interface, a ContentsManager decides what actually happens. By default it loads and saves files from the local filesystem, but a custom contents manager could do whatever it wants. A popular use case it to load/save contents from somewhere other than the local filesystem — Amazon S3 / Google Cloud Storage, PostgreSQL, HDFS, etc. When using one of these, you can load / save notebooks & files via the web interface as if they are on your local filesystem! This is extremely useful if you are already using any of these to store your data.
My favorite contents manager is Jupytext. It does some magic during save/load to give you a
.py equivalent of your
.ipynb, and keeps them in sync. You can explore code interactively in your notebook, then open the
.py file in an IDE to do some heavy text editing, and automatically get all your changes back in your notebook when you open it again. It’s quite magical.
JupyterHub is a multi-user application for spawning notebooks & other interactive web applications, designed for use in classrooms, research labs and companies. These organizations probably have other systems they are using, and JupyterHub needs to integrate strongly with them. Here is a non-exhaustive list of ways JupyterHub can be extended.
JupyterHub is a multi-user application, so users need to log in somehow — ideally the same way they log in to every other application in their organization. The authenticator is responsible for this. Authenticators already exist for many popular authentication services — LDAP, OAuth(Google, GitHub, CILogon, Globus, Okta, Canvas, etc), most LMS with LTI , SAML, JWT, plain usernames & passwords, linux users, etc. You can write your own or customize one that exists very easily, so whatever your authentication needs — JupyterHub has you covered.
Using pluggable spawners, you can start a Jupyter Notebook Server for each user in many different ways. You might want them to spawn on a node with docker containers, scale them out with Kubernetes, use it on your HPC cluster, have them run along your Hadoop / Spark cluster, contain them with systemd, simply run them as different linux users or in many other possible ways. The spawners themselves are usually extremely configurable, and of course you can write your own.
Often you want to provide additional services to your JupyterHub users — cull their servers when idle, or allow them to publish shareable notebooks. You can run a JupyterHub service to provide these — or similar — services. Users can make requests to them with their JupyterHub identities, and the services can make API calls to JupyterHub too. These can be arbitrary processes or web services — BinderHub is implemented as a JupyterHub service, for example.
nbconvert converts between the notebook format and various other formats — if you’ve exported your notebook to PDF, LaTeX, HTML, or used nbviewer, you have used nbconvert. It has an exporter for each format it exports to, and you can write your own to export to a new format — or to just massively customize an existing export format. If you’re performing complex conversion operations involving notebooks, you might find writing an exporter to be the cleanest way to accomplish your goals.
My happiest moment when researching for this blog post is finding out that a docx exporter exists.
Bundler extensions let you add entries to the Download as item in the menu bar. They are often paired with an nbconvert exporter to make the exporter more discoverable, though you can also write a custom bundler extension to do any kind of custom processing of a notebook before downloading. For example, nbreport provides a bundler extension that cleans up the notebook in a way suitable for viewing as a report & exports it as HTML.
repo2docker turns git (and other) repositories into reproducible, data science focused docker images. mybinder.org (and other binderhubinstallations) rely on it to build and launch interactive Jupyter/RStudio sessions from git repositories. There are currently two ways to extend repo2docker.
repo2docker looks at the contents of the repository to decide how to build it. For example, if there is a
requirements.txt it sets up a miniconda environment to install python packages into, while if there is an
install.Rfile it makes sure R/RStudio is installed. Writing a new BuildPack lets you extend this behavior to add support for your favorite language, or customize how an existing language is built.
The repo part of repo2docker is a misnomer — you can turn anything into a docker image. Currently, it supports
local folder and zenodorepositories — but you can add support for your favorite source of reproducible code by making a new ContentProvider!
Is that all?
Of course not? The Jupyter ecosystem is vast, and no one blog post can cover them all. This blog post is already missing a few — enterprise gateway, TLJH Plugins, etc. As time marches on, there will be newer components and newer ways of extending things that have not even been imagined yet. Leave a comment about what else is missing here.
Look forward to seeing what kinda beautiful extensions y’all create!