Getting Started

If you have questions about using or deploying your own Breeder Genomics Hub, create a GitHub issue with the “support” label!

Quick Start

Clone the repository:

git clone https://github.com/maize-genetics/breeder-genomics-hub
cd breeder-genomics-hub

Follow the ORCID API Tutorial to create an application via the Developer Tools submenu after clicking on your name in the top right of the page. This will allow you to utilize ORCID’s OAuth provider, enabling users to sign in with their ORCID iD.

Create an env file named prod.env containing the OAuth client ID and secret generated for your ORCID application.

Additionally, add the HUB_DOMAIN environment variable with the domain that you’ll be using to access the Breeder Genomics Hub. This is used by the reverse proxy Caddy to acquire a TLS certificate automatically via Let’s Encrypt. If you wish to force HTTP and not use a certificate, prefix this value with http:// (e.g. http://0.0.0.0:80).

OAUTH_CLIENT_ID=<APP-123ABC>
OAUTH_CLIENT_SECRET=<ORCID Secret>
HUB_DOMAIN=myhub.example.com
UID=1000

The UID value above is interpolated within the hub.yml Docker Compose config to utilize the Docker socket associated with your user. You can append this line easily by running:

echo "UID=$UID" >> prod.env

Next, make sure you have the breeder-notebook Jupyter image. This is the environment used for each client, so needs to be present, otherwise starting up a user’s server will time out. Get it via:

docker pull maizegenetics/breeder-notebook:latest

Then it’s as simple as using hub.yml to start your Breeder Genomics Hub:

docker compose --env-file prod.env -f hub.yml up -d

Make sure to include the --env-file prod.env option so that the UID value is recognized by Docker Compose.

Customization and Configuration

Permanent Storage

The Breeder Genomics Hub uses DockerSpawner to start containers for each user. The files within the container are only available during the lifecycle of the container (i.e. are deleted when it is stopped). In order to provide a means for users to store persistent data, we must configure the extension to mount a volume from the host into the spawned container. This volume will persist on the host filesystem between container restarts, enabling users to save data into the ~/work directory that they don’t want to lose. Add the following to your jupyterhub_config.py:

notebook_dir = "/home/jovyan/work"
c.DockerSpawner.notebook_dir = notebook_dir
c.DockerSpawner.volumes = { "breeder-{username}": notebook_dir }

Choose Where Data Is Stored

The above config snippet will create a Docker volume with a default mount point. If you are on Linux, it will likely be stored at ~/.local/share/docker/volumes. Using bind mounts via an absolute path is currently broken (#453), so if an administrator wishes to store persistent data elsewhere, they will need to employ a symbolic link:

ln -s /tmp/hub_userdata /home/your_user/.local/share/docker/volumes

You can list all current volumes via docker volume ls, and view information about any of them using docker inspect. For example, for a volume named breeder-bob:

your_user@your_server:~$ docker inspect breeder-bob
[
    {
        "CreatedAt": "2023-01-01T00:00:00Z",
        "Driver": "local",
        "Labels": null,
        "Mountpoint": "/home/your_user/.local/share/docker/volumes/breeder-bob/_data",
        "Name": "breeder-bob",
        "Options": null,
        "Scope": "local"
    }
]

For further context, see this GitHub comment.

User-installed Packages (Python, R, etc)

Please see the Installing Additional Software section.

User Authentication and Authorization

In JupyterHub, Authenticators are responsible for managing both authentication (verifying a user is who they say they are) and authorization (verifying if a given user is allowed to do some action). By default, the Breeder Genomics Hub uses ORCID’s OAuth functionality to enable individuals to log in to a Hub with their existing ORCID iD, removing the need for them to create an account specific to the Hub. For more information on using ORCID for logins, see the below subsection About ORCID iD & OAuth.

The general topic of authentication and authorization has security implications, and is therefore outside the scope of this documentation. A good starting point for JupyterHub specifically is their Authentication and User Basics tutorial page.

For example, if you’d like to limit access to an instance of the Breeder Genomics Hub, simply add the following to your jupyterhub_config.py:

c.GenericOAuthenticator.allowed_users = { "0000-0002-9079-593X", "0000-0002-3100-371X" }

The above would limit access to Stephen Hawking and Ed Buckler.

About ORCID iD & OAuth

The configuration for GenericOAuthenticator as seen in the code, follows the procedure in the Setup for ORCID iD section of the GenericOAuthenticator docs.

There are a variety of additional config options available; consult the GenericOAuthenticator API Reference for more information.

For example, this config allows any authenticated ORCID iD holder to log in.

Redirect URI

By default the redirect URI used by GenericOAuthenticator is based on the HUB_DOMAIN environment variable specified in the prod.env file.

If you wish to use a different redirect URI, provide a REDIRECT_URI value in your prod.env file:

REDIRECT_URI=https://thirdparty.com/hub/oauth_callback

If using a custom redirect URI, ensure that you use the /oauth_callback endpoint, otherwise authentication will be successful but you will encounter a 404 error.

Please refer to this ORCID FAQ for more information about how redirect URIs work.