Why Your Dockerfile COPY Invalidates the Build Cache More Than You Think

Docker layer caching looks simple until you watch a build reinstall 200 npm packages because you saved a README. The mechanism is not complicated, but most developers have a slightly wrong mental model of it — and that model costs them minutes on every build.

How the Cache Actually Works

Each instruction in a Dockerfile produces a layer. Docker caches layers by computing a cache key. For most instructions, the key is the instruction text plus the previous layer's ID. For COPY and ADD, it is more specific: Docker checksums every file being copied and includes those checksums in the cache key.

This means the cache invalidates when file content changes, not when timestamps change. Touching a file without modifying it does not invalidate the cache. Modifying a file does, even if the diff is one character. And invalidation propagates forward: once a layer is missed, every subsequent layer must be rebuilt from scratch.

# This layer is cached until package.json content changes
COPY package.json package-lock.json ./
RUN npm ci

# This layer rebuilds on every source file change
COPY src/ ./src/
RUN npm run build

The ordering above is deliberate. npm ci is expensive. By copying only the lock files before running it, you ensure the installation layer is cached unless your dependencies actually change. The application source copy comes after, so source changes only invalidate the cheaper build step.

.dockerignore Is the First Defense

Before Docker checksums your files, it transfers them from the build context to the daemon. The build context is everything in the directory passed to docker build. Without a .dockerignore, that includes node_modules, .git, local environment files, and anything else sitting in your project root.

Two things go wrong here. First, the context transfer is slow if it includes large directories. Second, files that should not affect the build (a change to .git/COMMIT_EDITMSG) will invalidate layers that COPY the project root, even though the source code itself did not change.

# .dockerignore
node_modules
.git
.env
*.log
dist
coverage

A missing .dockerignore is the most common cause of unexpected cache misses that have nothing to do with your code changes.

Multi-Stage Builds for Dependency Isolation

Multi-stage builds let you separate the dependency installation environment from the runtime environment. The caching benefit is that each stage caches independently:

FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci

FROM node:20-alpine AS build
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY src/ ./src/
RUN npm run build

FROM node:20-alpine AS runtime
WORKDIR /app
COPY --from=build /app/dist ./dist
CMD ["node", "dist/index.js"]

The deps stage caches separately from build. If your source changes but your dependencies do not, Docker reuses the deps stage entirely and only rebuilds from build onward. The final image is also smaller because it contains only the compiled output, not the source or dev dependencies.

COPY --link for Parallel Layer Extraction

BuildKit (enabled by default since Docker 23) introduced a flag that changes the semantics of COPY in a useful way:

COPY --link package.json package-lock.json ./

Without --link, each COPY layer depends on the previous layer's filesystem state. This means layers must be applied sequentially. With --link, each layer is independent — it can be extracted and cached in parallel. The practical benefit: --link layers can be reused across different base image versions. If you update the base image but the COPY content has not changed, Docker can reuse the cached layer rather than rebuilding it.

RUN --mount=type=cache for Package Managers

A different problem: package manager caches (npm's cache, pip's wheel cache, apt's package cache) are normally discarded after each RUN instruction, because Docker does not persist anything outside the layer. Every build re-downloads packages that were downloaded yesterday.

RUN --mount=type=cache,target=/root/.npm     npm ci

The mount persists the package manager cache on the host between builds, without including it in the image. The layer still caches normally (a cache miss reruns the RUN instruction), but the package manager itself finds its downloads already present and skips re-downloading them. On a cold layer cache, this can cut installation time significantly.

What This Changes in Practice

The correct mental model for Dockerfile ordering: sort instructions from most-stable to least-stable. Base image first. System dependencies second (they rarely change). Package manifest files third. Application source last. Any deviation from this ordering makes the expensive steps depend on cheap-to-change content.

The specific things that invalidate your cache unexpectedly are usually: files missing from .dockerignore, COPY instructions that grab more than they need, and dependency installation steps placed after source copies. Each of those is one line to fix. The rebuild minutes they save compound across every developer and every CI run.

Building something? builds.anethoth.com is a public build ledger — proof that a product is really being built.