feat(registry): add support for custom CA certificates and TLS validation
- Introduced `--registry-ca` and `--registry-ca-validate` flags for configuring TLS verification with private registries. - Implemented in-memory token caching with expiration handling. - Updated documentation to reflect new CLI options and usage examples. - Added tests for token cache concurrency and expiry behavior.pull/2128/head
parent
76f9cea516
commit
e1f67fc3d0
@ -0,0 +1,37 @@
|
||||
name: Race Detector
|
||||
|
||||
on:
|
||||
workflow_dispatch: {}
|
||||
pull_request:
|
||||
branches:
|
||||
- main
|
||||
|
||||
jobs:
|
||||
race:
|
||||
name: Run tests with race detector
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
go-version: [1.20.x]
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0
|
||||
|
||||
- name: Set up Go
|
||||
uses: actions/setup-go@v4
|
||||
with:
|
||||
go-version: ${{ matrix.go-version }}
|
||||
|
||||
- name: Install build tools (for cgo / race detector)
|
||||
run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y build-essential
|
||||
|
||||
- name: Ensure CGO enabled
|
||||
run: echo "CGO_ENABLED=1" >> $GITHUB_ENV
|
||||
|
||||
- name: Run tests with race detector
|
||||
run: |
|
||||
go test -race ./... -v
|
||||
@ -0,0 +1,7 @@
|
||||
# Changelog
|
||||
|
||||
All notable changes to this project will be documented in this file.
|
||||
|
||||
## [Unreleased]
|
||||
|
||||
- Add `--registry-ca-validate` flag: when supplied with `--registry-ca`, Watchtower can validate the provided CA bundle on startup and fail fast on misconfiguration. Prefer using this over `--insecure-registry` in production.
|
||||
@ -0,0 +1,29 @@
|
||||
# Summary Checkpoint
|
||||
|
||||
This file marks a checkpoint for summarizing repository changes.
|
||||
|
||||
All future requests that ask to "summarise all the changes thus far" should consider
|
||||
only changes made after this checkpoint was created.
|
||||
|
||||
Checkpoint timestamp (UTC): 2025-11-13T12:00:00Z
|
||||
|
||||
Notes:
|
||||
- Purpose: act as a stable anchor so that subsequent "summarise all the changes thus far"
|
||||
requests will include only modifications after this point.
|
||||
- Location: `docs/SUMMARY_CHECKPOINT.md`
|
||||
|
||||
Recent delta (since previous checkpoint):
|
||||
|
||||
- Added CLI flags and wiring: `--registry-ca` and `--registry-ca-validate` (startup validation).
|
||||
- Implemented secure-by-default registry transport behavior and support for a custom CA bundle.
|
||||
- Introduced an in-memory bearer token cache (honors `expires_in`) and refactored time usage
|
||||
to allow deterministic tests via an injectable `now` function.
|
||||
- Added deterministic unit tests for the token cache (`pkg/registry/auth/auth_cache_test.go`).
|
||||
- Added quickstart documentation snippets to `README.md`, `docs/index.md`, and
|
||||
`docs/private-registries.md` showing `--registry-ca` + `--registry-ca-validate`.
|
||||
- Created `CHANGELOG.md` with an Unreleased entry for the new `--registry-ca-validate` flag.
|
||||
- Ran package tests locally: `pkg/registry/auth` and `pkg/registry/digest` — tests passed
|
||||
(some integration tests were skipped due to missing credentials).
|
||||
|
||||
If you want the next checkpoint after more changes (e.g., mapping the update call chain,
|
||||
documenting data shapes, or adding concurrency tests), request another summary break.
|
||||
@ -0,0 +1,46 @@
|
||||
@startuml
|
||||
title Watchtower Update Flow
|
||||
actor User as CLI
|
||||
participant "cmd (root)" as CMD
|
||||
participant "internal/actions.Update" as ACT
|
||||
participant "container.Client" as CLIENT
|
||||
participant "pkg/registry/digest" as DIG
|
||||
participant "pkg/registry/auth" as AUTH
|
||||
participant "pkg/registry" as REG
|
||||
database "Docker Engine" as DOCKER
|
||||
|
||||
CLI -> CMD: trigger runUpdatesWithNotifications()
|
||||
CMD -> ACT: Update(client, UpdateParams)
|
||||
ACT -> CLIENT: ListContainers(filter)
|
||||
loop per container
|
||||
ACT -> CLIENT: IsContainerStale(container, params)
|
||||
CLIENT -> CLIENT: PullImage (maybe)
|
||||
CLIENT -> DIG: CompareDigest(container, registryAuth)
|
||||
DIG -> AUTH: GetToken(challenge)
|
||||
AUTH -> AUTH: getCachedToken / storeToken
|
||||
DIG -> REG: newTransport() (uses --insecure-registry / --registry-ca)
|
||||
DIG -> DOCKER: HEAD manifest with token
|
||||
alt digest matches
|
||||
CLIENT --> ACT: no pull needed
|
||||
else
|
||||
CLIENT -> DOCKER: ImagePull(image)
|
||||
end
|
||||
CLIENT --> ACT: HasNewImage -> stale/newestImage
|
||||
end
|
||||
ACT -> ACT: SortByDependencies
|
||||
ACT -> CLIENT: StopContainer / StartContainer (with lifecycle hooks)
|
||||
ACT -> CLIENT: RemoveImageByID (cleanup)
|
||||
ACT --> CMD: progress.Report()
|
||||
|
||||
note right of AUTH
|
||||
Tokens are cached by auth URL (realm+service+scope)
|
||||
ExpiresIn (seconds) sets TTL when provided
|
||||
end note
|
||||
|
||||
note left of REG
|
||||
TLS is secure-by-default
|
||||
`--registry-ca` provides PEM bundle
|
||||
`--registry-ca-validate` fails startup on invalid bundle
|
||||
end note
|
||||
|
||||
@enduml
|
||||
@ -0,0 +1,166 @@
|
||||
<!--
|
||||
DO NOT EDIT: Generated documentation describing the Watchtower update flow.
|
||||
This file contains the end-to-end flow, data shapes, and a mermaid diagram.
|
||||
-->
|
||||
# Watchtower Update Flow
|
||||
|
||||
This document explains the end-to-end update flow in the Watchtower codebase, including the main function call chain, the key data shapes, and diagrams (Mermaid & PlantUML).
|
||||
|
||||
## Quick Summary
|
||||
|
||||
- Trigger: CLI (`watchtower` start / scheduler / HTTP API update) constructs `types.UpdateParams` and calls `internal/actions.Update`.
|
||||
- `internal/actions.Update` orchestrates discovery, stale detection, lifecycle hooks, stopping/restarting containers, cleanup and reporting.
|
||||
- Image pull optimization uses a digest HEAD request (`pkg/registry/digest`) and a token flow (`pkg/registry/auth`) with an in-memory token cache.
|
||||
- TLS for HEAD/token requests is secure-by-default and configurable via `--insecure-registry`, `--registry-ca`, and `--registry-ca-validate`.
|
||||
|
||||
---
|
||||
|
||||
## Call Chain (step-by-step)
|
||||
|
||||
1. CLI start / scheduler / HTTP API
|
||||
- Entry points: `main()` -> `cmd.Execute()` -> Cobra command `Run` / `PreRun`.
|
||||
- `cmd.PreRun` reads flags and config, sets `registry.InsecureSkipVerify` and `registry.RegistryCABundle`.
|
||||
|
||||
2. Run update
|
||||
- `cmd.runUpdatesWithNotifications` builds `types.UpdateParams` and calls `internal/actions.Update(client, updateParams)`.
|
||||
|
||||
3. Orchestration: `internal/actions.Update`
|
||||
- If `params.LifecycleHooks` -> `lifecycle.ExecutePreChecks(client, params)`
|
||||
- Discover containers: `client.ListContainers(params.Filter)`
|
||||
- For each container:
|
||||
- `client.IsContainerStale(container, params)`
|
||||
- calls `client.PullImage(ctx, container)` unless `container.IsNoPull(params)` is true
|
||||
- `PullImage` obtains `types.ImagePullOptions` via `pkg/registry.GetPullOptions(image)`
|
||||
- tries digest optimization: `pkg/registry/digest.CompareDigest(container, opts.RegistryAuth)`
|
||||
- `auth.GetToken(container, registryAuth)` obtains a token:
|
||||
- sends GET to the challenge URL (`/v2/`), inspects `WWW-Authenticate`
|
||||
- for `Bearer`: constructs auth URL with `realm`, `service`, and `scope` (`repository:<path>:pull`)
|
||||
- checks in-memory cache (`auth.getCachedToken(cacheKey)`) keyed by the auth URL
|
||||
- if missing, requests token from auth URL (Basic header if Docker cred present), parses `types.TokenResponse` and calls `auth.storeToken(cacheKey, token, ExpiresIn)`
|
||||
- `digest.GetDigest(manifestURL, token)` performs an HTTP `HEAD` using a transport created by `digest.newTransport()`
|
||||
- transport respects `registry.InsecureSkipVerify` and uses `registry.GetRegistryCertPool()` when a CA bundle is provided
|
||||
- If remote digest matches a local digest, `PullImage` skips the pull
|
||||
- `client.HasNewImage(ctx, container)` compares local image ID with remote image ID
|
||||
- `targetContainer.VerifyConfiguration()` (fail/skip logic)
|
||||
- Mark scanned/skipped in `session.Progress` and set `container.SetStale(stale)`
|
||||
- Sort containers: `sorter.SortByDependencies(containers)`
|
||||
- `UpdateImplicitRestart(containers)` sets `LinkedToRestarting` flags
|
||||
- Build `containersToUpdate` and mark them for update in `Progress`
|
||||
- Update strategy:
|
||||
- Rolling restart: `performRollingRestart(containersToUpdate, client, params)`
|
||||
- `stopStaleContainer(c)` -> `restartStaleContainer(c)` per container
|
||||
- Normal: `stopContainersInReversedOrder(...)` -> `restartContainersInSortedOrder(...)`
|
||||
- `stopStaleContainer` runs `lifecycle.ExecutePreUpdateCommand` and `client.StopContainer`
|
||||
- `restartStaleContainer` may `client.RenameContainer` (watchtower self), `client.StartContainer` and `lifecycle.ExecutePostUpdateCommand`
|
||||
- If `params.Cleanup` -> `cleanupImages(client, imageIDs)` calls `client.RemoveImageByID`
|
||||
- If `params.LifecycleHooks` -> `lifecycle.ExecutePostChecks(client, params)`
|
||||
- Return `progress.Report()` (a `types.Report` implemented from `session.Progress`)
|
||||
|
||||
---
|
||||
|
||||
## Key data shapes
|
||||
|
||||
- `types.UpdateParams` (created in `cmd/runUpdatesWithNotifications`)
|
||||
- `Filter` (types.Filter)
|
||||
- `Cleanup bool`
|
||||
- `NoRestart bool`
|
||||
- `Timeout time.Duration`
|
||||
- `MonitorOnly bool`
|
||||
- `NoPull bool`
|
||||
- `LifecycleHooks bool`
|
||||
- `RollingRestart bool`
|
||||
- `LabelPrecedence bool`
|
||||
|
||||
- `container.Client` interface (in `pkg/container/client.go`) — used by `actions.Update`
|
||||
- `ListContainers(Filter) ([]types.Container, error)`
|
||||
- `GetContainer(containerID) (types.Container, error)`
|
||||
- `StopContainer(types.Container, time.Duration) error`
|
||||
- `StartContainer(types.Container) (types.ContainerID, error)`
|
||||
- `RenameContainer(types.Container, string) error`
|
||||
- `IsContainerStale(types.Container, types.UpdateParams) (bool, types.ImageID, error)`
|
||||
- `ExecuteCommand(containerID types.ContainerID, command string, timeout int) (SkipUpdate bool, err error)`
|
||||
- `RemoveImageByID(types.ImageID) error`
|
||||
- `WarnOnHeadPullFailed(types.Container) bool`
|
||||
|
||||
- `types.Container` interface (in `pkg/types/container.go`) — methods used include:
|
||||
- `ID(), Name(), ImageName(), ImageID(), SafeImageID(), IsRunning(), IsRestarting()`
|
||||
- `VerifyConfiguration() error`, `HasImageInfo() bool`, `ImageInfo() *types.ImageInspect`
|
||||
- lifecycle hooks: `GetLifecyclePreUpdateCommand(), GetLifecyclePostUpdateCommand(), PreUpdateTimeout(), PostUpdateTimeout()`
|
||||
- flags: `IsNoPull(UpdateParams), IsMonitorOnly(UpdateParams), ToRestart(), IsWatchtower()`
|
||||
|
||||
- `session.Progress` and `session.ContainerStatus` (reporting)
|
||||
- `Progress` is a map `map[types.ContainerID]*ContainerStatus`
|
||||
- `ContainerStatus` fields: `containerID, containerName, imageName, oldImage, newImage, error, state`
|
||||
- `Progress.Report()` returns a `types.Report` implementation
|
||||
|
||||
- `types.TokenResponse` (used by `pkg/registry/auth`) contains `Token string` and `ExpiresIn int` (seconds)
|
||||
|
||||
---
|
||||
|
||||
## Diagrams
|
||||
|
||||
Mermaid sequence diagram (embedded):
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant CLI as CLI / Scheduler / HTTP API
|
||||
participant CMD as cmd
|
||||
participant ACT as internal/actions.Update
|
||||
participant CLIENT as container.Client (docker wrapper)
|
||||
participant DIG as pkg/registry/digest
|
||||
participant AUTH as pkg/registry/auth
|
||||
participant REG as pkg/registry (TLS config)
|
||||
participant DOCKER as Docker Engine
|
||||
|
||||
CLI->>CMD: trigger runUpdatesWithNotifications()
|
||||
CMD->>ACT: Update(client, UpdateParams)
|
||||
ACT->>CLIENT: ListContainers(filter)
|
||||
loop per container
|
||||
ACT->>CLIENT: IsContainerStale(container, params)
|
||||
CLIENT->>CLIENT: PullImage (maybe)
|
||||
CLIENT->>DIG: CompareDigest(container, registryAuth)
|
||||
DIG->>AUTH: GetToken(challenge)
|
||||
AUTH->>AUTH: getCachedToken / storeToken
|
||||
DIG->>REG: newTransport() (uses --insecure-registry / --registry-ca)
|
||||
DIG->>DOCKER: HEAD manifest with token
|
||||
alt digest matches
|
||||
CLIENT-->>ACT: no pull needed
|
||||
else
|
||||
CLIENT->>DOCKER: ImagePull(image)
|
||||
end
|
||||
CLIENT-->>ACT: HasNewImage -> stale/ newestImage
|
||||
end
|
||||
ACT->>ACT: SortByDependencies
|
||||
ACT->>CLIENT: StopContainer / StartContainer (with lifecycle hooks)
|
||||
ACT->>CLIENT: RemoveImageByID (cleanup)
|
||||
ACT-->>CMD: progress.Report()
|
||||
```
|
||||
|
||||
For reference, a PlantUML source for the same sequence is available in `docs/diagrams/update-flow.puml`.
|
||||
|
||||
---
|
||||
|
||||
## Security & operational notes
|
||||
|
||||
- TLS: registry HEAD and token requests are secure-by-default. Use `--registry-ca` to add private CAs, and `--registry-ca-validate` to fail fast on bad bundles. Avoid `--insecure-registry` except for testing.
|
||||
- Token cache: tokens are cached per auth URL (realm+service+scope). Tokens with `ExpiresIn` are cached for that TTL. No persistent or distributed cache is provided.
|
||||
- Digest HEAD optimization avoids pulls and unnecessary rate consumption when possible. DockerHub/GHCR may rate-limit HEAD or behave differently; the code includes a `WarnOnAPIConsumption` heuristic.
|
||||
|
||||
---
|
||||
|
||||
## Where to look in the code
|
||||
|
||||
- Orchestration: `internal/actions/update.go`
|
||||
- CLI wiring: `cmd/root.go`, `internal/flags/flags.go`
|
||||
- Container wrapper: `pkg/container/client.go`, `pkg/container/container.go`
|
||||
- Digest & transport: `pkg/registry/digest/digest.go`
|
||||
- Token & auth handling: `pkg/registry/auth/auth.go`
|
||||
- TLS helpers: `pkg/registry/registry.go`
|
||||
- Lifecycle hooks: `pkg/lifecycle/lifecycle.go`
|
||||
- Session/reporting: `pkg/session/*`, `pkg/types/report.go`
|
||||
|
||||
---
|
||||
|
||||
If you'd like, I can also open a branch and create a PR with these files, or convert the PlantUML into an SVG and add it to the docs site.
|
||||
|
||||
End of document.
|
||||
@ -0,0 +1,101 @@
|
||||
package auth
|
||||
|
||||
import (
|
||||
"sync"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
// Test concurrent stores and gets to ensure the mutex protects the cache
|
||||
func TestTokenCacheConcurrentStoreAndGet(t *testing.T) {
|
||||
// reset cache safely
|
||||
tokenCacheMu.Lock()
|
||||
tokenCache = map[string]cachedToken{}
|
||||
tokenCacheMu.Unlock()
|
||||
|
||||
origNow := now
|
||||
defer func() { now = origNow }()
|
||||
now = time.Now
|
||||
|
||||
key := "concurrent-key"
|
||||
token := "tok-concurrent"
|
||||
|
||||
var wg sync.WaitGroup
|
||||
storeers := 50
|
||||
getters := 50
|
||||
iters := 100
|
||||
|
||||
for i := 0; i < storeers; i++ {
|
||||
wg.Add(1)
|
||||
go func() {
|
||||
defer wg.Done()
|
||||
for j := 0; j < iters; j++ {
|
||||
storeToken(key, token, 0)
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
for i := 0; i < getters; i++ {
|
||||
wg.Add(1)
|
||||
go func() {
|
||||
defer wg.Done()
|
||||
for j := 0; j < iters; j++ {
|
||||
_ = getCachedToken(key)
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
wg.Wait()
|
||||
|
||||
if got := getCachedToken(key); got != token {
|
||||
t.Fatalf("expected token %q, got %q", token, got)
|
||||
}
|
||||
}
|
||||
|
||||
// Test concurrent access while token expires: readers run while time is advanced
|
||||
func TestTokenCacheConcurrentExpiry(t *testing.T) {
|
||||
// reset cache safely
|
||||
tokenCacheMu.Lock()
|
||||
tokenCache = map[string]cachedToken{}
|
||||
tokenCacheMu.Unlock()
|
||||
|
||||
// Make now controllable and thread-safe
|
||||
origNow := now
|
||||
defer func() { now = origNow }()
|
||||
|
||||
base := time.Now()
|
||||
var mu sync.Mutex
|
||||
current := base
|
||||
now = func() time.Time {
|
||||
mu.Lock()
|
||||
defer mu.Unlock()
|
||||
return current
|
||||
}
|
||||
|
||||
key := "concurrent-expire"
|
||||
storeToken(key, "t", 1)
|
||||
|
||||
var wg sync.WaitGroup
|
||||
readers := 100
|
||||
|
||||
for i := 0; i < readers; i++ {
|
||||
wg.Add(1)
|
||||
go func() {
|
||||
defer wg.Done()
|
||||
for j := 0; j < 100; j++ {
|
||||
_ = getCachedToken(key)
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
// advance time beyond ttl
|
||||
mu.Lock()
|
||||
current = current.Add(2 * time.Second)
|
||||
mu.Unlock()
|
||||
|
||||
wg.Wait()
|
||||
|
||||
if got := getCachedToken(key); got != "" {
|
||||
t.Fatalf("expected token to be expired, got %q", got)
|
||||
}
|
||||
}
|
||||
@ -0,0 +1,54 @@
|
||||
package auth
|
||||
|
||||
import (
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
func TestTokenCacheStoreAndGetHitAndMiss(t *testing.T) {
|
||||
// save and restore original now
|
||||
origNow := now
|
||||
defer func() { now = origNow }()
|
||||
|
||||
// deterministic fake time
|
||||
base := time.Date(2025, time.November, 13, 12, 0, 0, 0, time.UTC)
|
||||
now = func() time.Time { return base }
|
||||
|
||||
key := "https://auth.example.com/?service=example&scope=repository:repo:pull"
|
||||
// ensure empty at start
|
||||
if got := getCachedToken(key); got != "" {
|
||||
t.Fatalf("expected empty cache initially, got %q", got)
|
||||
}
|
||||
|
||||
// store with no expiry (ttl <= 0)
|
||||
storeToken(key, "tok-123", 0)
|
||||
if got := getCachedToken(key); got != "tok-123" {
|
||||
t.Fatalf("expected token tok-123, got %q", got)
|
||||
}
|
||||
}
|
||||
|
||||
func TestTokenCacheExpiry(t *testing.T) {
|
||||
// save and restore original now
|
||||
origNow := now
|
||||
defer func() { now = origNow }()
|
||||
|
||||
// deterministic fake time that can be moved forward
|
||||
base := time.Date(2025, time.November, 13, 12, 0, 0, 0, time.UTC)
|
||||
current := base
|
||||
now = func() time.Time { return current }
|
||||
|
||||
key := "https://auth.example.com/?service=example&scope=repository:repo2:pull"
|
||||
// store with short ttl (1 second)
|
||||
storeToken(key, "short-tok", 1)
|
||||
|
||||
if got := getCachedToken(key); got != "short-tok" {
|
||||
t.Fatalf("expected token short-tok immediately after store, got %q", got)
|
||||
}
|
||||
|
||||
// advance time beyond ttl
|
||||
current = current.Add(2 * time.Second)
|
||||
|
||||
if got := getCachedToken(key); got != "" {
|
||||
t.Fatalf("expected token to be expired and removed, got %q", got)
|
||||
}
|
||||
}
|
||||
@ -0,0 +1,27 @@
|
||||
package digest_test
|
||||
|
||||
import (
|
||||
"github.com/containrrr/watchtower/pkg/registry"
|
||||
"github.com/containrrr/watchtower/pkg/registry/digest"
|
||||
. "github.com/onsi/ginkgo"
|
||||
. "github.com/onsi/gomega"
|
||||
)
|
||||
|
||||
var _ = Describe("Digest transport configuration", func() {
|
||||
AfterEach(func() {
|
||||
// Reset to default after each test
|
||||
registry.InsecureSkipVerify = false
|
||||
})
|
||||
|
||||
It("should have nil TLSClientConfig by default", func() {
|
||||
registry.InsecureSkipVerify = false
|
||||
tr := digest.NewTransportForTest()
|
||||
Expect(tr.TLSClientConfig).To(BeNil())
|
||||
})
|
||||
|
||||
It("should set TLSClientConfig when insecure flag is true", func() {
|
||||
registry.InsecureSkipVerify = true
|
||||
tr := digest.NewTransportForTest()
|
||||
Expect(tr.TLSClientConfig).ToNot(BeNil())
|
||||
})
|
||||
})
|
||||
Loading…
Reference in New Issue