99 ways to extend the Jupyter ecosystem


Post HERE

Whenever someone says ‘You can do that with an extension’ in the Jupyter ecosystem, it is often not clear what kind of extension they are talking about. The Jupyter ecosystem is very modular and extensible, so there are lots of ways to extend it. This blog post aims to provide a quick summary of the most common ways to extend Jupyter, and links to help you explore the extension ecosystem.

JupyterLab extensions (labextension)

Draw vector graphics in JupyterLab with the jupyterlab-drawio extension

JupyterLab is a popular ‘new’ interface for working with Jupyter Notebooks. It is an interactive development environment for working with notebooks, code and data — and hence extremely extensible. Using JupyterLab extensions, you can add entirely new functionality or change almost any aspect of how the interface behaves. These are written in TypeScript or JavaScript, and run in the browser.

The JupyterLab documentation has information on how to install & use extensions, as well as how to author & distribute them. You can also discover extensions by searching on GitHub or npmjs.com

My favorite JupyterLab extension is jupyterlab-vim — it lets you fully use Vim keybindings inside JupyterLab!

Classic Notebook extensions (nbextension)

Table of Contents nbextension

When people think of ‘the notebook interface’, they are probably thinking of the classic Jupyter Notebook. You can extend any aspect of the notebook user experience with nbextensions. These are little bits of client-side JavaScript that allow you to add / change functionality as you wish. They are the Classic Notebook equivalent to JupyterLab extensions.

The Jupyter Notebook documentation has information on how to install or develop extensions. The Unofficial Jupyter Notebook extensions repository has a lot of popular extensions and a GUI extension manager you can use to install nbextensions.

My favorite nbextension provides a collapsible Table of Contents for your notebooks.

Notebook Server Extensions (serverextension)

Unlike JupyterLab or nbextensions, Jupyter Notebook Server extensions are written in Python to add some serverside functionality. There are two primary use cases for server extensions.

LaTeX previews in JupyterLab

The first use case is to provide a backend for a particular JupyterLab or classic notebook extension. An example is the jupyterlab-latex JupyterLab extension, which provides live previews of LaTeX files in JupyterLab. It has a frontend JupyterLab extension to integrate with the JupyterLab text editor, and a backend serverextension component that actually runs the LaTeX commands to produce the output displayed to you.

The second use case is to provide any user interface backed by any kind of server side processing. Server extensions can function as arbitrary TornadoHTTP handlers — so any web application you can think of, you can write as a Jupyter serverextension. An example is nbgitpuller, which provides UI and mechanisms to distribute notebooks from git repositories to your users in a transparent way.

My favorite here is jupyter-rsession-proxy, which lets you run RStudio in JupyterHub environments!

Jupyter Kernels

You might be most familiar with using Jupyter notebooks with Python, but you can use a ton of other languages when writing your notebook: RJuliaJavaScriptOctave, Scala/Sparkinteractive C++bash, or even Matlab! These are called kernels, and they speak the language agnostic Jupyter protocol over zeromq. You can write a new kernel for your language by directly implementing the Jupyter protocol, by wrapping it with the metakernel project, or using C++ bindings via Xeus. Once a kernel exists, it seamlessly works with any Jupyter frontend — classic notebook, JupyterLab, nteract, the terminal jupyter console, the graphical Qt Console , etc.

My favorite kernel is the linux kernel.

IPython Magics

If you’ve written %matplotlib inlinein a notebook, you have used an IPython magic. These are almost like macros for Python — you can write custom code that parses the rest of the line (or cell), and do whatever it is that you want.

Line magics start with one % symbol and take some action based on the rest of the line. For example, %cd somedirectory changes the current directory of the python process. Cell magics start with %% and operate on the entire cell contents after it. %%timeit is probably the most famous – it’ll run the code a number of times and report stats on how long it takes to run.

You can also build your own magic command that integrates with IPython. For example, the ipython-sql package provides the%%sql magic command for working seamlessly with databases. However, remember that in contrast to the extensions listed so far, IPython magics only work with the IPython kernel.

My favorite use of IPython magics is this blog post by Matthias Bussonnier, which makes great use of custom magics to seamlessly integrate Python, R, C and Julia in the same notebook.

IPython Widgets (ipywidgets)

Play with plot options with dropdown. Courtesy Towards Data Science by Will Koehrsen

IPython Widgets (ipywidgets) provide interactive GUI widgets for Jupyter notebooks and the IPython kernel. They let you and the people you share your notebooks with explore various options in your code with GUI elements rather than having to modify code. Coupled with something like voila, you can make dashboard-like applications for other people to consume without realizing it was created completely with a Jupyter Notebook!

You can build your own custom widgets to provide domain-specific interactive visualizations. For example, you can interactively visualize maps with ipyleaflet, use itk-jupyter-widget to explore image segmentation/registration problems interactively, or model 3D objects with pythreejs.

Check out the vdom project for a more reactive take on the same problem space, and xwidgets for a C++ implementation

Contents Manager

Whenever you open or save a notebook or file through the web interface, a ContentsManager decides what actually happens. By default it loads and saves files from the local filesystem, but a custom contents manager could do whatever it wants. A popular use case it to load/save contents from somewhere other than the local filesystem — Amazon S3 / Google Cloud StoragePostgreSQLHDFS, etc. When using one of these, you can load / save notebooks & files via the web interface as if they are on your local filesystem! This is extremely useful if you are already using any of these to store your data.

My favorite contents manager is Jupytext. It does some magic during save/load to give you a .py equivalent of your .ipynb, and keeps them in sync. You can explore code interactively in your notebook, then open the .py file in an IDE to do some heavy text editing, and automatically get all your changes back in your notebook when you open it again. It’s quite magical.

Jupytext: .ipynb or .py? why not both!

Extending JupyterHub

JupyterHub is a multi-user application for spawning notebooks & other interactive web applications, designed for use in classrooms, research labs and companies. These organizations probably have other systems they are using, and JupyterHub needs to integrate strongly with them. Here is a non-exhaustive list of ways JupyterHub can be extended.

Authenticators

JupyterHub is a multi-user application, so users need to log in somehow — ideally the same way they log in to every other application in their organization. The authenticator is responsible for this. Authenticators already exist for many popular authentication services — LDAPOAuth(Google, GitHub, CILogon, Globus, Okta, Canvas, etc), most LMS with LTI , SAMLJWT, plain usernames & passwordslinux users, etc. You can write your own or customize one that exists very easily, so whatever your authentication needs — JupyterHub has you covered.

Spawners

Using pluggable spawners, you can start a Jupyter Notebook Server for each user in many different ways. You might want them to spawn on a node with docker containers, scale them out with Kubernetes, use it on your HPC cluster, have them run along your Hadoop / Spark cluster, contain them with systemd, simply run them as different linux users or in many other possible ways. The spawners themselves are usually extremely configurable, and of course you can write your own.

Services

Often you want to provide additional services to your JupyterHub users — cull their servers when idle, or allow them to publish shareable notebooks. You can run a JupyterHub service to provide these — or similar — services. Users can make requests to them with their JupyterHub identities, and the services can make API calls to JupyterHub too. These can be arbitrary processes or web services — BinderHub is implemented as a JupyterHub service, for example.

NBConvert Exporter

nbconvert converts between the notebook format and various other formats — if you’ve exported your notebook to PDF, LaTeX, HTML, or used nbviewer, you have used nbconvert. It has an exporter for each format it exports to, and you can write your own to export to a new format — or to just massively customize an existing export format. If you’re performing complex conversion operations involving notebooks, you might find writing an exporter to be the cleanest way to accomplish your goals.

My happiest moment when researching for this blog post is finding out that a docx exporter exists.

Bundler Extensions

Discoverable way to enable nbconvert exporters

Bundler extensions let you add entries to the Download as item in the menu bar. They are often paired with an nbconvert exporter to make the exporter more discoverable, though you can also write a custom bundler extension to do any kind of custom processing of a notebook before downloading. For example, nbreport provides a bundler extension that cleans up the notebook in a way suitable for viewing as a report & exports it as HTML.

Repo2Docker

repo2docker turns git (and other) repositories into reproducible, data science focused docker images. mybinder.org (and other binderhubinstallations) rely on it to build and launch interactive Jupyter/RStudio sessions from git repositories. There are currently two ways to extend repo2docker.

BuildPacks

repo2docker looks at the contents of the repository to decide how to build it. For example, if there is a requirements.txt it sets up a miniconda environment to install python packages into, while if there is an install.Rfile it makes sure R/RStudio is installed. Writing a new BuildPack lets you extend this behavior to add support for your favorite language, or customize how an existing language is built.

ContentProviders

The repo part of repo2docker is a misnomer — you can turn anything into a docker image. Currently, it supports gitlocal folder and zenodorepositories — but you can add support for your favorite source of reproducible code by making a new ContentProvider!

Is that all?

Of course not? The Jupyter ecosystem is vast, and no one blog post can cover them all. This blog post is already missing a few — enterprise gatewayTLJH Plugins, etc. As time marches on, there will be newer components and newer ways of extending things that have not even been imagined yet. Leave a comment about what else is missing here.

Look forward to seeing what kinda beautiful extensions y’all create!

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión /  Cambiar )

Google photo

Estás comentando usando tu cuenta de Google. Cerrar sesión /  Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión /  Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión /  Cambiar )

Conectando a %s

Este sitio usa Akismet para reducir el spam. Aprende cómo se procesan los datos de tus comentarios .