How to create a
Devcontainer for your
Python project ๐Ÿณ

Hi. Welcome everyone. So today we are going to talk about "How to create a Devcontainer for your Python project".

First: I'd like to get a show of hands ๐Ÿ™‹โ€โ™€๏ธ๐Ÿ™‹โ€โ™‚๏ธ๐Ÿ–:

  • Who has played around with Devcontainers before?
  • Who has used Devcontainers at their company?

Jeroen Overschie

I am Jeroen Overschie. I am a Machine Learning Engineer at GoDataDriven, working in Amsterdam. I help companies take their Machine Learning solutions into production.

The use case¶

You are assigned to setup a new repo for a team. The requirements are as follows:

Apache Spark Apache Spark - pyspark Python

So we need to align on:

  • ๐Ÿ“Œ A specific Java version
  • ๐Ÿ“Œ A specific Python version
  • ๐Ÿ“Œ A specific pyspark version

โ†’ otherwise we do not enjoy the guarantees we want in production code

  • Apache Spark to process data (we have a cluster setup. Plus needs Java.)
  • pyspark (pip package)
  • Python (Python version)

โ†’ we want guarantees in production code.

PP
Petra the Python Dev 19/10 10:30
Hi my venv somehow got corrupted ๐Ÿฅฒ. It's saying No module named 'numpy' even though I seemingly have it installed. Not sure how to fix.
P.S. this is also why I was not in the meeting.
EO
Erik the old project maintainer 19/11 11:30
Hey so I was working on maintaining the old project. Just found out that some of the dependencies don't even work on my MacOS version anymore. FML
BW
Billy Windows 15/11 15:30
I'm trying to dockerize the project but it's haaard. Everything that works on my Windows laptop seems to fail on Linux.
PU
Pyspark User 10/10 13:32
Hii. Can you maybe run pip show pyspark? I'm curious which pyspark version you are running ๐Ÿง. Because if it works for you but not for me and also not in the CI maybe your environment is different than both. Just checking.
JJ
Johnny Junior 19/10 09:03
Good day. I'm new at the company and wanted to get started working on the repo. Tried following the README steps but doesn't work. By any chance: are there any more detailed docs available for project setup? No right? ๐Ÿ™ƒ
  1. PP: Petra the Python Dev. Hi my venv somehow got corrupted ๐Ÿฅฒ. It's saying No module named 'numpy'. even though I seemingly have it installed. Not sure how to fix.

P.S. this is also why I was not in the meeting.

  1. EO: Erik the old project maintainer. Hey so I was working on maintaining the old project. Just found out that some of the dependencies don't even work on my MacOS version anymore. FML
  2. BW: Billy Windows. I'm trying to dockerize the project but it's haaard. Everything that works on my Windows laptop seems to fail on Linux.
  3. PU: PySpark user. Hii. Can you maybe run pip show pyspark? I'm curious which pyspark version you are running ๐Ÿง. Because if it works for you but not for me and also not in the CI maybe your environment is different than both. Just checking.
  4. JJ: Johnny Junior. Good day. I'm new at the company and wanted to get started working on the repo. Tried following the README steps but doesn't work. By any chance: are there more detailed docs available for the project setup? No right? ๐Ÿ™ƒ

๐Ÿ› Issues

PP
Petra the Python Dev Corrupted Virtual Environment
EO
Erik the old project maintainer Outdated project dependencies
BW
Billy Windows Containerisation & going into production
e.g. Windows / Linux / MacOS
PU
PySpark user Inconsistent environments
e.g. local โ†” CI โ†” prod โ†” team members
JJ
Johnny Junior No formal specification of install steps
and missing docs ๐Ÿ“š

๐Ÿ”„ How to get a reproducible Dev environment?

Devcontainers to the rescue ⛑!¶

๐Ÿณ Docker helps us create a formal definition of our environment.

๐Ÿ”Œ Devcontainers allow you to connect your editor (IDE) to that container.

Devcontainer overview

๐Ÿ”„ Reproducible means:
  • โšก๏ธ Faster onboarding
  • ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ Better alignment between team members
  • โฑ Smaller gap to production
Downsides?
  • Docker knowledge
  • Docker
  • Devcontainer: connect to your IDE
  • Formal instruction set for dev env setup

👷🏻‍♂️ Let's build a Devcontainer!¶

Letโ€™s say we have a really simple project that looks like this:

$ tree .
.
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ requirements-dev.txt
โ”œโ”€โ”€ sales_analysis.py
โ””โ”€โ”€ test_sales_analysis.py

The .devcontainer folder¶

Your Devcontainer spec will live inside the .devcontainer folder.

There will be two main files:

  • devcontainer.json
  • Dockerfile

Create a new file called devcontainer.json:

{
    "build": {
        "dockerfile": "Dockerfile",
        "context": ".."
    }
}

This does basically means: as a base for our Devcontainer, use the Dockerfile located in the current directory, and build it with a current working directory (cwd) of ...

So how does this Dockerfile look like?

FROM python:3.10

# Install Java
RUN apt update && \
    apt install -y sudo && \
    sudo apt install default-jdk -y

## Pip dependencies
# Upgrade pip
RUN pip install --upgrade pip
# Install production dependencies
COPY requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt && \
    rm /tmp/requirements.txt
# Install development dependencies
COPY requirements-dev.txt /tmp/requirements-dev.txt
RUN pip install -r /tmp/requirements-dev.txt && \
    rm /tmp/requirements-dev.txt

Helping out:

EO
Erik the old project maintainer Outdated project dependencies
BW
Billy Windows Containerisation & going into production
e.g. Windows / Linux / MacOS
JJ
Johnny Junior No formal specification of install steps
and missing docs ๐Ÿ“š

We are building our image on top of python:3.10, which is a Debian-based image. This is one of the Linux distributions that a Devcontainer can be built on. The main requirement is that Node.js should be able to run: VSCode automatically installs VSCode Server on the machine.

For an extensive list of supported distributions, see โ€œRemote Development with Linuxโ€.

Opening the Devcontainer¶

The .devcontainer folder in place, now itโ€™s time to open our Devcontainer.

Open up the command pallete (CMD + Shift + P) and select โ€œDev Containers: Reopen in Containerโ€:

Dev Containers: Reopen in Container

Upon opening a repo with a valid .devcontainer folder, you are already notified:

folder contains a dev container config file

Rebuilding allows you to get a fresh environment anytime you want:

Devcontainer rebuild option

Helping out:

PP
Petra the Python Dev Corrupted Virtual Environment

What is happening under the hood 🚗¶

Besides starting the Docker image and attaching the terminal to it, VSCode is doing a couple more things:

  1. VSCode Server is being installed on your Devcontainer.
    VSCode Server is installed as a service in the container itself so your VSCode installation can communicate with the container.
    For example, install and run extensions.
  2. Config is copied over.
    Config like ~/.gitconfig and ~/.ssh/known_hosts are copied over to their respective locations in the container.
  3. Filesystem mounts.
    VSCode automatically takes care of mounting:
    • The folder you are running the Devcontainer from.
    • Your VSCode workspace folder.

Besides starting the Docker image and attaching the terminal to it, VSCode is doing a couple more things:

  1. VSCode Server is being installed on your Devcontainer.
    VSCode Server is installed as a service in the container itself so your VSCode installation can communicate with the container.
    For example, install and run extensions.
  2. Config is copied over.
    Config like ~/.gitconfig and ~/.ssh/known_hosts are copied over to their respective locations in the container.
  3. Filesystem mounts.
    VSCode automatically takes care of mounting:
    • The folder you are running the Devcontainer from.
    • Your VSCode workspace folder.

Opening your Devcontainer with the click of a button¶

Your entire project setup is now encapsulated in the Devcontainer. So actually we can add a Markdown button to open up the Devcontainer:

[
    ![Open in Remote - Containers](
        https://img.shields.io/static/v1?label=Remote%20-%20Containers&message=Open&color=blue&logo=visualstudiocode
    )
](
    https://vscode.dev/redirect?url=vscode://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/godatadriven/python-devcontainer-template
)

Which basically means, open this URL:

vscode://
  ms-vscode-remote.remote-containers/
  cloneInVolume?
  url=https://github.com/godatadriven/python-devcontainer-template

Just modify the GitHub URL after url= โœ“.

This renders the following button:

Open in Remote - Containers

What kind of README would you rather like?

Extending the Devcontainer¶

We have built a working Devcontainer, that is great! But a couple things are still missing.

  • Install a non-root user for extra safety and good-practice
  • Pass in custom VSCode settings and install extensions by default
  • Be able to access Spark UI (opening up port 4040)
  • Run Continuous Integration (CI) in the Devcontainer

Let's see how.

Installing a non-root user¶

If you pip install a new package, you will see the following message:

The warning message: โ€œ*WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: [https://pip.pypa.io/warnings/venv](https://pip.pypa.io/warnings/venv)*

So let's go ahead and create a user for this scenario.

# Add non-root user
ARG USERNAME=nonroot
RUN groupadd --gid 1000 $USERNAME && \
    useradd --uid 1000 --gid 1000 -m $USERNAME
## Make sure to reflect new user in PATH
ENV PATH="/home/${USERNAME}/.local/bin:${PATH}"
USER $USERNAME

Add the following property to devcontainer.json:

"remoteUser": "nonroot"

That's great! When we now start the container we should connect as the user nonroot.

Passing custom VSCode settings¶

"customizations": {
    "vscode": {
        "extensions": [
            "ms-python.python"
        ],
        "settings": {
            "python.testing.pytestArgs": [
                "."
            ],
            "python.testing.unittestEnabled": false,
            "python.testing.pytestEnabled": true,
            "python.formatting.provider": "black",
            "python.linting.mypyEnabled": true,
            "python.linting.enabled": true
        }
    }
}

Helping out:

JJ
Johnny Junior No formal specification of install steps
and missing docs ๐Ÿ“š

The defined extensions are always installed in the Devcontainer. However, the defined settings provide just a default for the user to use, and can still be overriden by other setting scopes like: User Settings, Remote Settings or Workspace Settings.

Accessing Spark UI¶

Since we are using pyspark, it would be nice to be able to access Spark UI.

"portsAttributes": {
    "4040": {
        "label": "SparkUI",
        "onAutoForward": "notify"
    }
},

"forwardPorts": [
    4040
]

When we now run our code, we get a notification we can open Spark UI in the browser:

open Spark UI in the browser

Resulting in the Spark UI like we know it:

spark UI in the browser

Running our CI in the Devcontainer¶

Wouldn't it be convenient if we could re-use our Devcontainer to run our Continuous Integration (CI) pipeline as well? Indeed, we can do this with Devcontainers. Similarly to how the Devcontainer image is built locally using docker build, the same can be done within a CI/CD pipeline.

There are two basic options:

  1. Build the Docker image within the CI/CD pipeline
  2. Prebuilding the image

Let's see about option number (1).

1. Build the Docker image within the CI/CD pipeline¶

Luckily, a GitHub Action was already setup for us to do exactly this:

devcontainers/ci

To now build, push and run a command in the Devcontainer is as easy as:


name: Python app

on:
  ...

jobs:
 build:
  runs-on: ubuntu-latest

  steps:
   - name: Checkout (GitHub)
    uses: actions/checkout@v3

   - name: Login to GitHub Container Registry
    uses: docker/login-action@v2
    with:
     registry: ghcr.io
     username: ${{ github.repository_owner }}
     password: ${{ secrets.GITHUB_TOKEN }}

   - name: Build and run dev container task
    uses: devcontainers/ci@v0.2
    with:
     imageName: ghcr.io/${{ github.repository }}/devcontainer
     runCmd: pytest .

That's great! Whenever this workflow runs on your main branch, the image will be pushed to the configured registry; in this case GitHub Container Registry (GHCR).

See below a trace of the executed GitHub Action:

running-ci-in-the-devcontainer-github-actions

Awesome!

Consistent environment

PU
PySpark user Inconsistent environments
e.g. local โ†” CI โ†” prod โ†” team members

The final Devcontainer definition¶

We built the following Devcontainer definitions.

First, devcontainer.json:

{
    "build": {
        "dockerfile": "Dockerfile",
        "context": ".."
    },

    "remoteUser": "nonroot",

    "customizations": {
        "vscode": {
            "extensions": [
                "ms-python.python"
            ],
            "settings": {
                "python.testing.pytestArgs": [
                    "."
                ],
                "python.testing.unittestEnabled": false,
                "python.testing.pytestEnabled": true,
                "python.formatting.provider": "black",
                "python.linting.mypyEnabled": true,
                "python.linting.enabled": true
            }
        }
    },
    ...
}
"portsAttributes": {
    "4040": {
        "label": "SparkUI",
        "onAutoForward": "notify"
    }
},

"forwardPorts": [
    4040
]

And our Dockerfile:

FROM python:3.10

# Install Java
RUN apt update && \
    apt install -y sudo && \
    sudo apt install default-jdk -y

# Add non-root user
ARG USERNAME=nonroot
RUN groupadd --gid 1000 $USERNAME && \
    useradd --uid 1000 --gid 1000 -m $USERNAME
## Make sure to reflect new user in PATH
ENV PATH="/home/${USERNAME}/.local/bin:${PATH}"
USER $USERNAME

## Pip dependencies
# Upgrade pip
RUN pip install --upgrade pip
# Install production dependencies
COPY --chown=nonroot:1000 requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt && \
    rm /tmp/requirements.txt
# Install development dependencies
COPY --chown=nonroot:1000 requirements-dev.txt /tmp/requirements-dev.txt
RUN pip install -r /tmp/requirements-dev.txt && \
    rm /tmp/requirements-dev.txt

Going further 🔮¶

  • Mounting directories

    ๐Ÿ’ก Pro tip: mount your AWS/GCP/Azure credentials

  • Devcontainer templates
  • Devcontainer features
  • Remote Development ๐ŸŒ

    e.g. GitHub Codespaces, VM's on Azure/GCP/AWS

... and much more (see references slide)

Concluding¶

๐Ÿ”Œ Devcontainers connect your IDE to a running ๐Ÿณ Docker container.

โ†’ reproducibility & isolation whilst getting a native experience.

๐Ÿ”„ Reproducible means:
  • โšก๏ธ Faster onboarding
  • ๐Ÿ‘จโ€๐Ÿ‘ฉโ€๐Ÿ‘งโ€๐Ÿ‘ฆ Better alignment between team members
  • โฑ Smaller gap to production

Now only VSCode, but open specification taking shape.

๐Ÿค

PP
EO

BW
PU
JJ

๐Ÿซต

Thanks! 🙌🏻¶

python-devcontainer-template github QR code link
github.com/godatadriven/python-devcontainer-template

Awesome resources¶

Associated blog post:

  • How to create a Devcontainer for your Python project ๐Ÿณ

Spec:

  • containers.dev. The official Devcontainer specification.
    • Devcontainer templates
    • Devcontainer features

Docs:

  • Dev Containers VSCode extension. The extension required to connect VSCode to a Devcontainer.
  • Mounting file directories in Devcontainers.
  • Add a non-root user to a container. More explanations & instructions for adding a non-root user to your Dockerfile and devcontainer.json.
  • Pre-building dev container images

Repo's:

  • github.com/devcontainers/ci. Run your CI in your Devcontainers. Built on the Devcontainer CLI.
  • github.com/devcontainers/images. A collection of ready-to-use Devcontainer images.
  • github.com/manekinekko/awesome-devcontainers. A repo pointing to yet even more awesome resources.

About¶

Author: Jeroen Overschie, working at GoDataDriven.