Skip to content

Conversation

@lukel97
Copy link
Contributor

@lukel97 lukel97 commented Dec 2, 2025

This resolves the conversation at #117 (comment)

Currently we install the build dependencies like g++ (RUN apk add...) in a separate layer. We try and purge them in a later step but this doesn't achieve anything because docker caches everything on a layer by layer basis, i.e. the previous layer will still contain them and hang around.

This PR fixes it by using a multi-stage build which reduces the image size by ~90%:

$ docker image ls
REPOSITORY              TAG         IMAGE ID       CREATED          SIZE
lnt-new-dockerfile     latest      22cfaa9c28b0   14 minutes ago   88MB
lnt-old-dockerfile      <none>      76a17f3f65ed   5 seconds ago       797MB

This patch also only copies over the minimal source files required so as to make sure the dependencies layer is invalidated as infrequently as possible. E.g. here is an example of building the image after touching a regular python source file. We no longer need to rebuild or redownload the dependencies, that layer is cached:

$ docker build -f docker/lnt.dockerfile .
[+] Building 22.0s (15/15) FINISHED                                                                      docker:default
 => [internal] load build definition from lnt.dockerfile                                                           0.0s
 => => transferring dockerfile: 1.73kB                                                                             0.0s
 => [internal] load metadata for docker.io/library/python:3.10-alpine                                              0.4s
 => [internal] load .dockerignore                                                                                  0.0s
 => => transferring context: 2B                                                                                    0.0s
 => [internal] load build context                                                                                  0.5s
 => => transferring context: 562.77kB                                                                              0.4s
 => CACHED [builder 1/7] FROM docker.io/library/python:3.10-alpine@sha256:b4da816c29d5d3067a979e299ea3e4856476a2b  0.0s
 => [final 2/4] RUN apk update && apk add --no-cache libpq                                                         2.8s
 => CACHED [builder 2/7] RUN apk update && apk add --no-cache g++ postgresql-dev yaml-dev git libpq                0.0s
 => CACHED [builder 3/7] COPY pyproject.toml .                                                                     0.0s
 => CACHED [builder 4/7] COPY lnt/testing/profile lnt/testing/profile                                              0.0s
 => CACHED [builder 5/7] RUN pip install --user ".[server]"                                                        0.0s
 => [builder 6/7] COPY . .                                                                                         2.1s
 => [builder 7/7] RUN pip install --user .                                                                        17.0s
 => [final 3/4] COPY --from=builder /root/.local /root/.local                                                      0.9s 
 => [final 4/4] COPY docker/docker-entrypoint.sh docker/docker-entrypoint-log.sh docker/lnt-wait-db /usr/local/bi  0.0s 
 => exporting to image                                                                                             0.6s 
 => => exporting layers                                                                                            0.6s 
 => => writing image sha256:527e0140763072f26d84ef42b0bb9dda9857625acb3474ac5f159960c8d20bc4                       0.0s

We need to use a mock version for setuptools_scm, since it looks for a .git folder and we want to avoid copying that. That would cause the build dependency layers to be invalidated on every commit.

&& apk add --no-cache libpq
&& apk add --no-cache --virtual .build-deps g++ postgresql-dev yaml-dev \
&& apk add --no-cache git libpq \
&& pip install ".[server]" \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also switched from requirements.server.txt to the dependencies in pyproject.toml in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I think that's alright, but in that case I would suggest we remove all references to requirements.server.txt, since it's basically not going to be used for anything meaningful anymore. Alternatively, we can keep using requirements.server.txt in this PR and then clean it up afterwards.

Comment on lines 31 to 38
COPY pyproject.toml .
COPY lnt/testing/profile lnt/testing/profile

# Fake a version for setuptools so we don't need to COPY .git
ENV SETUPTOOLS_SCM_PRETEND_VERSION=0.1

# Install dependencies and build cperf ext-modules.
RUN apk update \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
COPY pyproject.toml .
COPY lnt/testing/profile lnt/testing/profile
# Fake a version for setuptools so we don't need to COPY .git
ENV SETUPTOOLS_SCM_PRETEND_VERSION=0.1
# Install dependencies and build cperf ext-modules.
RUN apk update \
# Install dependencies and build cperf ext-modules.
# Note that we fake a version for setuptools so we don't need to COPY .git.
COPY pyproject.toml .
COPY lnt/testing/profile lnt/testing/profile
ENV SETUPTOOLS_SCM_PRETEND_VERSION=0.1
RUN apk update \

I find it easier to understand if everything is grouped under the same "install dependencies" comment, since, all of these shenanigans are logically only required to install dependencies.

&& apk add --no-cache libpq
&& apk add --no-cache --virtual .build-deps g++ postgresql-dev yaml-dev \
&& apk add --no-cache git libpq \
&& pip install ".[server]" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I think that's alright, but in that case I would suggest we remove all references to requirements.server.txt, since it's basically not going to be used for anything meaningful anymore. Alternatively, we can keep using requirements.server.txt in this PR and then clean it up afterwards.

Comment on lines 44 to 45
# Let setuptools_scm use git to pick the version
ENV SETUPTOOLS_SCM_PRETEND_VERSION=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would group this with the above "dependency installation steps", since this is logically "the closing brace" for temporarily setting up SETUPTOOLS_SCM_PRETEND_VERSION=0.1.

Comment on lines 47 to 48
COPY . .
RUN pip install .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still suggesting using a bind mount. We don't actually need the LNT sources as part of the Docker image, we only need the result of the installation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using a bind mount but couldn't get anywhere, because whatever way cp -Ring the sources ended up clobbering the cperf sources, which caused pip to rebuild it. But this fails at this stage because we've purged g++ etc.

So I've switched this to a multi-stage build, it's been a while since I've used one of these but it allows us to only copy over the installation and leave the build stuff in a separate build image. So we don't need to worry about purging dependencies etc. This article explains it quite well: https://pythonspeed.com/articles/multi-stage-docker-python/

The main savings from this are that we get rid of the .git folder in the image, so the image is down to just 88MB now:

root@bb-luke-debian:~/llvm-lnt# docker image ls
REPOSITORY              TAG         IMAGE ID       CREATED          SIZE
llvm-lnt-webserver      latest      22cfaa9c28b0   14 minutes ago   88MB

@lukel97 lukel97 changed the title Install dependencies in separate layer in Dockerfile Install and purge dependencies in same layer in Dockerfile Dec 9, 2025
@lukel97 lukel97 changed the title Install and purge dependencies in same layer in Dockerfile Use multi-stage build in Dockerfile Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants