ansible-test - Improve container management. (#78550)

See changelogs/fragments/ansible-test-container-management.yml for details.
pull/79510/head
Matt Clay 2 years ago committed by GitHub
parent 3bda4eae6f
commit cda16cc5e9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -84,7 +84,7 @@ stages:
- stage: Remote
dependsOn: []
jobs:
- template: templates/matrix.yml
- template: templates/matrix.yml # context/target
parameters:
targets:
- name: macOS 12.0
@ -104,7 +104,7 @@ stages:
groups:
- 1
- 2
- template: templates/matrix.yml
- template: templates/matrix.yml # context/controller
parameters:
targets:
- name: macOS 12.0
@ -119,6 +119,23 @@ stages:
- 3
- 4
- 5
- template: templates/matrix.yml # context/controller (ansible-test container management)
parameters:
targets:
- name: Alpine 3.16
test: alpine/3.16
- name: Fedora 36
test: fedora/36
- name: RHEL 8.6
test: rhel/8.6
- name: RHEL 9.0
test: rhel/9.0
- name: Ubuntu 20.04
test: ubuntu/20.04
- name: Ubuntu 22.04
test: ubuntu/22.04
groups:
- 6
- stage: Docker
dependsOn: []
jobs:

@ -0,0 +1,59 @@
major_changes:
- ansible-test - Docker and Podman are now supported on hosts with cgroup v2 unified.
Previously only cgroup v1 and cgroup v2 hybrid were supported.
- ansible-test - Docker Desktop on WSL2 is now supported (additional configuration required).
- ansible-test - Podman on WSL2 is now supported.
- ansible-test - Podman now works on container hosts without systemd.
Previously only some containers worked, while others required rootfull or rootless Podman,
but would not work with both. Some containers did not work at all.
- ansible-test - When additional cgroup setup is required on the container host, this will be automatically detected.
Instructions on how to configure the host will be provided in the error message shown.
minor_changes:
- ansible-test - When using Podman, ansible-test will detect if the loginuid used in containers is incorrect.
When this occurs a warning is displayed and the container is run with the AUDIT_CONTROL capability.
Previously containers would fail under this situation, with no useful warnings or errors given.
- ansible-test - Failure to connect to a container over SSH now results in a clear error.
Previously tests would be attempted even after initial connection attempts failed.
- ansible-test - Warnings are now shown when using containers that were built with VOLUME instructions.
- ansible-test - Unit tests now support network disconnect by default when running under Podman.
Previously this feature only worked by default under Docker.
- ansible-test - Additional log details are shown when containers fail to start or SSH connections to containers fail.
- ansible-test - Containers included with ansible-test no longer disable seccomp by default.
- ansible-test - A new ``cgroup`` option is available when running custom containers.
This option can be used to indicate a container requires cgroup v1 or that it does not use cgroup.
The default behavior assumes the container works with cgroup v2 (as well as v1).
- ansible-test - A new ``audit`` option is available when running custom containers.
This option can be used to indicate whether a container requires the AUDIT_WRITE capability.
The default is ``required``, which most containers will need when using Podman.
If necessary, the ``none`` option can be used to opt-out of the capability.
This has no effect on Docker, which always provides the capability.
- ansible-test - More details are provided about an instance when provisioning fails.
- ansible-test - Connection failures to remote provisioned hosts now show failure details as a warning.
- ansible-test - When setting the max open files for containers, the container host's limit will be checked.
If the host limit is lower than the preferred value, it will be used and a warning will be shown.
- ansible-test - Use ``stop --time 0`` followed by ``rm`` to remove ephemeral containers instead of ``rm -f``.
This speeds up teardown of ephemeral containers.
- ansible-test - Reduce the polling limit for SSHD startup in containers from 60 retries to 10.
The one second delay between retries remains in place.
- ansible-test - Integration tests can be excluded from retries triggered by the ``--retry-on-error`` option by
adding the ``retry/never`` alias. This is useful for tests that cannot pass on a retry or are too
slow to make retries useful.
bugfixes:
- ansible-test - Multiple containers now work under Podman without specifying the ``--docker-network`` option.
- ansible-test - Prevent concurrent / repeat pulls of the same container image.
- ansible-test - Prevent concurrent / repeat inspections of the same container image.
- ansible-test - Prevent concurrent execution of cached methods.
- ansible-test - Handle server errors when executing the ``docker info`` command.
- ansible-test - Show the exception type when reporting errors during instance provisioning.
- ansible-test - Pass the ``XDG_RUNTIME_DIR`` environment variable through to container commands.
- ansible-test - Connection attempts to managed remote instances no longer abort on ``Permission denied`` errors.
known_issues:
- ansible-test - Using Docker on systems with SELinux may require setting SELinux to permissive mode.
Podman should work with SELinux in enforcing mode.
- ansible-test - Additional configuration may be required for certain container host and container combinations.
Further details are available in the testing documentation.
- ansible-test - Systems with Podman networking issues may be unable to run containers, when previously the issue
went unreported. Correct the networking issues to continue using ``ansible-test`` with Podman.
- ansible-test - Custom containers with ``VOLUME`` instructions may be unable to start, when previously the containers
started correctly. Remove the ``VOLUME`` instructions to resolve the issue. Containers with this
condition will cause ``ansible-test`` to emit a warning.

@ -16,7 +16,7 @@ Prepare your environment
These steps assume a Linux work environment with ``git`` installed.
1. Install and start ``docker`` or ``podman`` with the ``docker`` executable shim. This insures tests run properly isolated and in the exact environments as in CI. The latest ``ansible-core`` development version also supports the ``podman`` CLI program.
1. Install and start ``docker`` or ``podman``. This ensures tests run properly isolated and in the same environment as in CI.
2. :ref:`Install Ansible or ansible-core <installation_guide>`. You need the ``ansible-test`` utility which is provided by either of these packages.
@ -155,11 +155,9 @@ See :ref:`module_contribution` for some general guidelines about Ansible module
Test your changes
=================
If using the ``docker`` CLI program, the host must be configured to use cgroupsv1 (this is not required for ``podman``). This can be done by adding ``systemd.unified_cgroup_hierarchy=0`` to the kernel boot arguments (requires bootloader config update and reboot).
1. Install ``flake8`` (``pip install flake8``, or install the corresponding package on your operating system).
1. Run ``flake8`` against a changed file:
2. Run ``flake8`` against a changed file:
.. code-block:: bash
@ -169,7 +167,7 @@ Test your changes
This shows unused imports, which are not shown by sanity tests, as well as other common issues.
Optionally, you can use the ``--max-line-length=160`` command-line argument.
2. Run sanity tests:
3. Run sanity tests:
.. code-block:: bash
@ -178,7 +176,7 @@ Test your changes
If they failed, look at the output carefully - it is informative and helps to identify a problem line quickly.
Sanity failings usually relate to incorrect code and documentation formatting.
3. Run integration tests:
4. Run integration tests:
.. code-block:: bash

@ -96,6 +96,7 @@ There are several other aliases available as well:
- ``destructive`` - Requires ``--allow-destructive`` to run without ``--docker`` or ``--remote``.
- ``hidden`` - Target is ignored. Usable as a dependency. Automatic for ``setup_`` and ``prepare_`` prefixed targets.
- ``retry/never`` - Target is excluded from retries enabled by the ``--retry-on-error`` option.
Unstable
--------

@ -2,44 +2,329 @@
.. _testing_running_locally:
***************
Testing Ansible
***************
*******************************
Testing Ansible and Collections
*******************************
This document describes how to:
* Run tests locally using ``ansible-test``
* Extend
This document describes how to run tests using ``ansible-test``.
.. contents::
:local:
Requirements
============
Setup
=====
There are no special requirements for running ``ansible-test`` on Python 2.7 or later.
The ``argparse`` package is required for Python 2.6.
The requirements for each ``ansible-test`` command are covered later.
Before running ``ansible-test``, set up your environment for :ref:`Testing an Ansible Collection` or
:ref:`Testing ansible-core`, depending on which scenario applies to you.
.. warning::
Test Environments
=================
If you use ``git`` for version control, make sure the files you are working with are not ignored by ``git``.
If they are, ``ansible-test`` will ignore them as well.
Testing an Ansible Collection
-----------------------------
If you are testing an Ansible Collection, you need a copy of the collection, preferably a git clone.
For example, to work with the ``community.windows`` collection, follow these steps:
1. Clone the collection you want to test into a valid collection root:
.. code-block:: shell
git clone https://github.com/ansible-collections/community.windows ~/dev/ansible_collections/community/windows
.. important::
The path must end with ``/ansible_collections/{collection_namespace}/{collection_name}`` where
``{collection_namespace}`` is the namespace of the collection and ``{collection_name}`` is the collection name.
2. Clone any collections on which the collection depends:
.. code-block:: shell
git clone https://github.com/ansible-collections/ansible.windows ~/dev/ansible_collections/ansible/windows
.. important::
If your collection has any dependencies on other collections, they must be in the same collection root, since
``ansible-test`` will not use your configured collection roots (or other Ansible configuration).
.. note::
See the collection's ``galaxy.yml`` for a list of possible dependencies.
3. Switch to the directory where the collection to test resides:
.. code-block:: shell
cd ~/dev/ansible_collections/community/windows
Testing ``ansible-core``
------------------------
If you are testing ``ansible-core`` itself, you need a copy of the ``ansible-core`` source code, preferably a git clone.
Having an installed copy of ``ansible-core`` is not sufficient or required.
For example, to work with the ``ansible-core`` source cloned from GitHub, follow these steps:
1. Clone the ``ansible-core`` repository:
.. code-block:: shell
git clone https://github.com/ansible/ansible ~/dev/ansible
2. Switch to the directory where the ``ansible-core`` source resides:
.. code-block:: shell
cd ~/dev/ansible
3. Add ``ansible-core`` programs to your ``PATH``:
.. code-block:: shell
source hacking/env-setup
.. note::
You can skip this step if you only need to run ``ansible-test``, and not other ``ansible-core`` programs.
In that case, simply run ``bin/ansible-test`` from the root of the ``ansible-core`` source.
.. caution::
If you have an installed version of ``ansible-core`` and are trying to run ``ansible-test`` from your ``PATH``,
make sure the program found by your shell is the one from the ``ansible-core`` source:
.. code-block:: shell
which ansible-test
Commands
========
The most commonly used test commands are:
* ``ansible-test sanity`` - Run sanity tests (mostly linters and static analysis).
* ``ansible-test integration`` - Run integration tests.
* ``ansible-test units`` - Run unit tests.
Run ``ansible-test --help`` to see a complete list of available commands.
.. note::
For detailed help on a specific command, add the ``--help`` option after the command.
Environments
============
Most ``ansible-test`` commands support running in one or more isolated test environments to simplify testing.
Containers
----------
Containers are recommended for running sanity, unit and integration tests, since they provide consistent environments.
Unit tests will be run with network isolation, which avoids unintentional dependencies on network resources.
The ``--docker`` option runs tests in a container using either Docker or Podman.
.. note::
If both Docker and Podman are installed, Docker will be used.
To override this, set the environment variable ``ANSIBLE_TEST_PREFER_PODMAN`` to any non-empty value.
Choosing a container
^^^^^^^^^^^^^^^^^^^^
Without an additional argument, the ``--docker`` option uses the ``default`` container.
To use another container, specify it immediately after the ``--docker`` option.
.. note::
The ``default`` container is recommended for all sanity and unit tests.
To see the list of supported containers, use the ``--help`` option with the ``ansible-test`` command you want to use.
.. note::
The list of available containers is dependent on the ``ansible-test`` command you are using.
You can also specify your own container.
When doing so, you will need to indicate the Python version in the container with the ``--python`` option.
Custom containers
"""""""""""""""""
When building custom containers, keep in mind the following requirements:
* The ``USER`` should be ``root``.
* Use an ``init`` process, such as ``systemd``.
* Include ``sshd`` and accept connections on the default port of ``22``.
* Include a POSIX compatible ``sh`` shell which can be found on ``PATH``.
* Include a ``sleep`` utility which runs as a subprocess.
* Include a supported version of Python.
* Avoid using the ``VOLUME`` statement.
Docker and SELinux
^^^^^^^^^^^^^^^^^^
Using Docker on a host with SELinux may require setting the system in permissive mode.
Consider using Podman instead.
Docker Desktop with WSL2
^^^^^^^^^^^^^^^^^^^^^^^^
These instructions explain how to use ``ansible-test`` with WSL2 and Docker Desktop *without* ``systemd`` support.
Remote
------
.. note::
The ``--remote`` option runs tests in a cloud hosted environment.
An API key is required to use this feature.
If your WSL2 environment includes ``systemd`` support, these steps are not required.
Recommended for integration tests.
Configuration requirements
""""""""""""""""""""""""""
See the `list of supported platforms and versions <https://github.com/ansible/ansible/blob/devel/test/lib/ansible_test/_data/completion/remote.txt>`_ for additional details.
1. Open Docker Desktop and go to the **Settings** screen.
2. On the the **General** tab:
Environment Variables
---------------------
a. Uncheck the **Start Docker Desktop when you log in** checkbox.
b. Check the **Use the WSL 2 based engine** checkbox.
3. On the **Resources** tab under the **WSL Integration** section:
a. Enable distros you want to use under the **Enable integration with additional distros** section.
4. Click **Apply and restart** if changes were made.
Setup instructions
""""""""""""""""""
.. note::
If all WSL instances have been stopped, these changes will need to be re-applied.
1. Verify Docker Desktop is properly configured (see :ref:`Configuration requirements`).
2. Quit Docker Desktop if it is running:
a. Right click the **Docker Desktop** taskbar icon.
b. Click the **Quit Docker Desktop** option.
3. Stop any running WSL instances with the command:
.. code-block:: shell
wsl --shutdown
4. Verify all WSL instances have stopped with the command:
.. code-block:: shell
wsl -l -v
5. Start a WSL instance and perform the following steps as ``root``:
a. Verify the ``systemd`` subsystem is not registered:
a. Check for the ``systemd`` cgroup hierarchy with the following command:
.. code-block:: shell
grep systemd /proc/self/cgroup
b. If any matches are found, re-check the :ref:`Configuration requirements` and follow the
:ref:`Setup instructions` again.
b. Mount the ``systemd`` cgroup hierarchy with the following commands:
.. code-block:: shell
mkdir /sys/fs/cgroup/systemd
mount cgroup -t cgroup /sys/fs/cgroup/systemd -o none,name=systemd,xattr
6. Start Docker Desktop.
You should now be able to use ``ansible-test`` with the ``--docker`` option.
Linux cgroup configuration
^^^^^^^^^^^^^^^^^^^^^^^^^^
.. note::
These changes will need to be re-applied each time the container host is booted.
For certain container hosts and container combinations, additional setup on the container host may be required.
In these situations ``ansible-test`` will report an error and provide additional instructions to run as ``root``:
.. code-block:: shell
mkdir /sys/fs/cgroup/systemd
mount cgroup -t cgroup /sys/fs/cgroup/systemd -o none,name=systemd,xattr
If you are using rootless Podman, an additional command must be run, also as ``root``.
Make sure to substitute your user and group for ``{user}`` and ``{group}`` respectively:
.. code-block:: shell
chown -R {user}:{group} /sys/fs/cgroup/systemd
Podman
""""""
When using Podman, you may need to stop existing Podman processes after following the :ref:`Linux cgroup configuration`
instructions. Otherwise Podman may be unable to see the new mount point.
You can check to see if Podman is running by looking for ``podman`` and ``catatonit`` processes.
Remote virtual machines
-----------------------
Remote virtual machines are recommended for running integration tests not suitable for execution in containers.
The ``--remote`` option runs tests in a cloud hosted ephemeral virtual machine.
.. note::
An API key is required to use this feature, unless running under an approved Azure Pipelines organization.
To see the list of supported systems, use the ``--help`` option with the ``ansible-test`` command you want to use.
.. note::
The list of available systems is dependent on the ``ansible-test`` command you are using.
Python virtual environments
---------------------------
Python virtual environments provide a simple way to achieve isolation from the system and user Python environments.
They are recommended for unit and integration tests when the ``--docker`` and ``--remote`` options cannot be used.
The ``--venv`` option runs tests in a virtual environment managed by ``ansible-test``.
Requirements are automatically installed before tests are run.
Composite environment arguments
-------------------------------
The environment arguments covered in this document are sufficient for most use cases.
However, some scenarios may require the additional flexibility offered by composite environment arguments.
The ``--controller`` and ``--target`` options are alternatives to the ``--docker``, ``--remote`` and ``--venv`` options.
.. note::
When using the ``shell`` command, the ``--target`` option is replaced by three platform specific options.
Add the ``--help`` option to your ``ansible-test`` command to learn more about the composite environment arguments.
Additional Requirements
=======================
Some ``ansible-test`` commands have additional requirements.
You can use the ``--requirements`` option to automatically install them.
.. note::
When using a test environment managed by ``ansible-test`` the ``--requirements`` option is usually unnecessary.
Environment variables
=====================
When using environment variables to manipulate tests there some limitations to keep in mind. Environment variables are:
@ -51,16 +336,15 @@ When using environment variables to manipulate tests there some limitations to k
and the tests executed. This is useful for debugging tests inside a container by following the
:ref:`Debugging AnsibleModule-based modules <debugging_modules>` instructions.
Interactive Shell
Interactive shell
=================
Use the ``ansible-test shell`` command to get an interactive shell in the same environment used to run tests. Examples:
* ``ansible-test shell --docker`` - Open a shell in the default docker container.
* ``ansible-test shell --venv --python 3.6`` - Open a shell in a Python 3.6 virtual environment.
* ``ansible-test shell --venv --python 3.10`` - Open a shell in a Python 3.10 virtual environment.
Code Coverage
Code coverage
=============
Code coverage reports make it easy to identify untested code for which more tests should
@ -72,22 +356,17 @@ aren't using the ``--venv`` or ``--docker`` options which create an isolated pyt
environment then you may have to use the ``--requirements`` option to ensure that the
correct version of the coverage module is installed:
.. code-block:: shell-session
.. code-block:: shell
ansible-test coverage erase
ansible-test units --coverage apt
ansible-test integration --coverage aws_lambda
ansible-test coverage html
Reports can be generated in several different formats:
* ``ansible-test coverage report`` - Console report.
* ``ansible-test coverage html`` - HTML report.
* ``ansible-test coverage xml`` - XML report.
To clear data between test runs, use the ``ansible-test coverage erase`` command. For a full list of features see the online help:
.. code-block:: shell-session
ansible-test coverage --help
To clear data between test runs, use the ``ansible-test coverage erase`` command.

@ -0,0 +1,5 @@
shippable/posix/group6
context/controller
needs/root
destructive
retry/never # tests on some platforms run too long to make retries useful

File diff suppressed because it is too large Load Diff

@ -0,0 +1,5 @@
#!/usr/bin/env bash
set -eu
./runme.py

@ -1,9 +1,9 @@
base image=quay.io/ansible/base-test-container:3.9.0 python=3.11,2.7,3.5,3.6,3.7,3.8,3.9,3.10 seccomp=unconfined
default image=quay.io/ansible/default-test-container:7.4.0 python=3.11,2.7,3.5,3.6,3.7,3.8,3.9,3.10 seccomp=unconfined context=collection
default image=quay.io/ansible/ansible-core-test-container:7.4.0 python=3.11,2.7,3.5,3.6,3.7,3.8,3.9,3.10 seccomp=unconfined context=ansible-core
alpine3 image=quay.io/ansible/alpine3-test-container:4.8.0 python=3.10
centos7 image=quay.io/ansible/centos7-test-container:4.8.0 python=2.7 seccomp=unconfined
fedora36 image=quay.io/ansible/fedora36-test-container:4.8.0 python=3.10 seccomp=unconfined
base image=quay.io/ansible/base-test-container:3.9.0 python=3.11,2.7,3.5,3.6,3.7,3.8,3.9,3.10
default image=quay.io/ansible/default-test-container:7.4.0 python=3.11,2.7,3.5,3.6,3.7,3.8,3.9,3.10 context=collection
default image=quay.io/ansible/ansible-core-test-container:7.4.0 python=3.11,2.7,3.5,3.6,3.7,3.8,3.9,3.10 context=ansible-core
alpine3 image=quay.io/ansible/alpine3-test-container:4.8.0 python=3.10 cgroup=none audit=none
centos7 image=quay.io/ansible/centos7-test-container:4.8.0 python=2.7 cgroup=v1-only
fedora36 image=quay.io/ansible/fedora36-test-container:4.8.0 python=3.10
opensuse15 image=quay.io/ansible/opensuse15-test-container:4.8.0 python=3.6
ubuntu2004 image=quay.io/ansible/ubuntu2004-test-container:4.8.0 python=3.8 seccomp=unconfined
ubuntu2204 image=quay.io/ansible/ubuntu2204-test-container:4.8.0 python=3.10 seccomp=unconfined
ubuntu2004 image=quay.io/ansible/ubuntu2004-test-container:4.8.0 python=3.8
ubuntu2204 image=quay.io/ansible/ubuntu2204-test-container:4.8.0 python=3.10

@ -11,8 +11,13 @@ from .init import (
CURRENT_RLIMIT_NOFILE,
)
from .constants import (
STATUS_HOST_CONNECTION_ERROR,
)
from .util import (
ApplicationError,
HostConnectionError,
display,
report_locale,
)
@ -94,6 +99,10 @@ def main(cli_args: t.Optional[list[str]] = None) -> None:
display.review_warnings()
config.success = True
except HostConnectionError as ex:
display.fatal(str(ex))
ex.run_callback()
sys.exit(STATUS_HOST_CONNECTION_ERROR)
except ApplicationWarning as ex:
display.warning('%s' % ex)
sys.exit(0)

@ -51,6 +51,10 @@ from .host_configs import (
PythonConfig,
)
from .thread import (
mutex,
)
def parse_inventory(args: EnvironmentConfig, inventory_path: str) -> dict[str, t.Any]:
"""Return a dict parsed from the given inventory file."""
@ -192,6 +196,7 @@ def configure_plugin_paths(args: CommonConfig) -> dict[str, str]:
return env
@mutex
def get_ansible_python_path(args: CommonConfig) -> str:
"""
Return a directory usable for PYTHONPATH, containing only the ansible package.

@ -0,0 +1,79 @@
"""Linux control group constants, classes and utilities."""
from __future__ import annotations
import dataclasses
import pathlib
class CGroupPath:
"""Linux cgroup path constants."""
ROOT = '/sys/fs/cgroup'
SYSTEMD = '/sys/fs/cgroup/systemd'
SYSTEMD_RELEASE_AGENT = '/sys/fs/cgroup/systemd/release_agent'
class MountType:
"""Linux filesystem mount type constants."""
TMPFS = 'tmpfs'
CGROUP_V1 = 'cgroup'
CGROUP_V2 = 'cgroup2'
@dataclasses.dataclass(frozen=True)
class CGroupEntry:
"""A single cgroup entry parsed from '/proc/{pid}/cgroup' in the proc filesystem."""
id: int
subsystem: str
path: pathlib.PurePosixPath
@property
def root_path(self):
"""The root path for this cgroup subsystem."""
return pathlib.PurePosixPath(CGroupPath.ROOT, self.subsystem)
@property
def full_path(self) -> pathlib.PurePosixPath:
"""The full path for this cgroup subsystem."""
return pathlib.PurePosixPath(self.root_path, str(self.path).lstrip('/'))
@classmethod
def parse(cls, value: str) -> CGroupEntry:
"""Parse the given cgroup line from the proc filesystem and return a cgroup entry."""
cid, subsystem, path = value.split(':')
return cls(
id=int(cid),
subsystem=subsystem.removeprefix('name='),
path=pathlib.PurePosixPath(path)
)
@classmethod
def loads(cls, value: str) -> tuple[CGroupEntry, ...]:
"""Parse the given output from the proc filesystem and return a tuple of cgroup entries."""
return tuple(cls.parse(line) for line in value.splitlines())
@dataclasses.dataclass(frozen=True)
class MountEntry:
"""A single mount entry parsed from '/proc/{pid}/mounts' in the proc filesystem."""
device: pathlib.PurePosixPath
path: pathlib.PurePosixPath
type: str
options: tuple[str, ...]
@classmethod
def parse(cls, value: str) -> MountEntry:
"""Parse the given mount line from the proc filesystem and return a mount entry."""
device, path, mtype, options, _a, _b = value.split(' ')
return cls(
device=pathlib.PurePosixPath(device),
path=pathlib.PurePosixPath(path),
type=mtype,
options=tuple(options.split(',')),
)
@classmethod
def loads(cls, value: str) -> tuple[MountEntry, ...]:
"""Parse the given output from the proc filesystem and return a tuple of mount entries."""
return tuple(cls.parse(line) for line in value.splitlines())

@ -289,6 +289,19 @@ class ChoicesParser(DynamicChoicesParser):
return '|'.join(self.choices)
class EnumValueChoicesParser(ChoicesParser):
"""Composite argument parser which relies on a static list of choices derived from the values of an enum."""
def __init__(self, enum_type: t.Type[enum.Enum], conditions: MatchConditions = MatchConditions.CHOICE) -> None:
self.enum_type = enum_type
super().__init__(choices=[str(item.value) for item in enum_type], conditions=conditions)
def parse(self, state: ParserState) -> t.Any:
"""Parse the input from the given state and return the result."""
value = super().parse(state)
return self.enum_type(value)
class IntegerParser(DynamicChoicesParser):
"""Composite argument parser for integers."""
PATTERN = re.compile('^[1-9][0-9]*$')

@ -397,6 +397,8 @@ def add_global_docker(
docker_network=None,
docker_terminate=None,
prime_containers=False,
dev_systemd_debug=False,
dev_probe_cgroups=None,
)
return
@ -428,6 +430,24 @@ def add_global_docker(
help='download containers without running tests',
)
# Docker support isn't related to ansible-core-ci.
# However, ansible-core-ci support is a reasonable indicator that the user may need the `--dev-*` options.
suppress = None if get_ci_provider().supports_core_ci_auth() else argparse.SUPPRESS
parser.add_argument(
'--dev-systemd-debug',
action='store_true',
help=suppress or 'enable systemd debugging in containers',
)
parser.add_argument(
'--dev-probe-cgroups',
metavar='DIR',
nargs='?',
const='',
help=suppress or 'probe container cgroups, with optional log dir',
)
def add_environment_docker(
exclusive_parser: argparse.ArgumentParser,

@ -10,6 +10,11 @@ from ...constants import (
SUPPORTED_PYTHON_VERSIONS,
)
from ...completion import (
AuditMode,
CGroupVersion,
)
from ...util import (
REMOTE_ARCHITECTURES,
)
@ -27,6 +32,7 @@ from ..argparsing.parsers import (
BooleanParser,
ChoicesParser,
DocumentationState,
EnumValueChoicesParser,
IntegerParser,
KeyValueParser,
Parser,
@ -103,6 +109,8 @@ class DockerKeyValueParser(KeyValueParser):
return dict(
python=PythonParser(versions=self.versions, allow_venv=False, allow_default=self.allow_default),
seccomp=ChoicesParser(SECCOMP_CHOICES),
cgroup=EnumValueChoicesParser(CGroupVersion),
audit=EnumValueChoicesParser(AuditMode),
privileged=BooleanParser(),
memory=IntegerParser(),
)
@ -116,6 +124,8 @@ class DockerKeyValueParser(KeyValueParser):
state.sections[f'{"controller" if self.controller else "target"} {section_name} (comma separated):'] = '\n'.join([
f' python={python_parser.document(state)}',
f' seccomp={ChoicesParser(SECCOMP_CHOICES).document(state)}',
f' cgroup={EnumValueChoicesParser(CGroupVersion).document(state)}',
f' audit={EnumValueChoicesParser(AuditMode).document(state)}',
f' privileged={BooleanParser().document(state)}',
f' memory={IntegerParser().document(state)} # bytes',
])

@ -17,9 +17,9 @@ from ...io import (
from ...util import (
display,
SubprocessError,
get_ansible_version,
get_available_python_versions,
ApplicationError,
)
from ...util_common import (
@ -30,8 +30,7 @@ from ...util_common import (
from ...docker_util import (
get_docker_command,
docker_info,
docker_version
get_docker_info,
)
from ...constants import (
@ -178,14 +177,12 @@ def get_docker_details(args: EnvConfig) -> dict[str, t.Any]:
executable = docker.executable
try:
info = docker_info(args)
except SubprocessError as ex:
display.warning('Failed to collect docker info:\n%s' % ex)
try:
version = docker_version(args)
except SubprocessError as ex:
display.warning('Failed to collect docker version:\n%s' % ex)
docker_info = get_docker_info(args)
except ApplicationError as ex:
display.warning(str(ex))
else:
info = docker_info.info
version = docker_info.version
docker_details = dict(
executable=executable,

@ -531,6 +531,10 @@ def command_integration_filtered(
if not tries:
raise
if target.retry_never:
display.warning(f'Skipping retry of test target "{target.name}" since it has been excluded from retries.')
raise
display.warning('Retrying test target "%s" with maximum verbosity.' % target.name)
display.verbosity = args.verbosity = 6

@ -286,6 +286,9 @@ class IntegrationAliasesTest(SanitySingleVersion):
}
for target in posix_targets:
if target.name == 'ansible-test-container':
continue # special test target which uses group 6 -- nothing else should be in that group
if f'{self.TEST_ALIAS_PREFIX}/posix/' not in target.aliases:
continue
@ -345,6 +348,9 @@ class IntegrationAliasesTest(SanitySingleVersion):
messages = []
for path in unassigned_paths:
if path == 'test/integration/targets/ansible-test-container':
continue # special test target which uses group 6 -- nothing else should be in that group
messages.append(SanityMessage(unassigned_message, '%s/aliases' % path))
for path in conflicting_paths:

@ -9,6 +9,8 @@ from ...util import (
ApplicationError,
OutputStream,
display,
SubprocessError,
HostConnectionError,
)
from ...config import (
@ -115,4 +117,19 @@ def command_shell(args: ShellConfig) -> None:
else:
cmd = []
con.run(cmd, capture=False, interactive=True)
try:
con.run(cmd, capture=False, interactive=True)
except SubprocessError as ex:
if isinstance(con, SshConnection) and ex.status == 255:
# 255 indicates SSH itself failed, rather than a command run on the remote host.
# In this case, report a host connection error so additional troubleshooting output is provided.
if not args.delegate and not args.host_path:
def callback() -> None:
"""Callback to run during error display."""
target_profile.on_target_failure() # when the controller is not delegated, report failures immediately
else:
callback = None
raise HostConnectionError(f'SSH shell connection failed for host {target_profile.config}: {ex}', callback) from ex
raise

@ -3,6 +3,7 @@ from __future__ import annotations
import abc
import dataclasses
import enum
import os
import typing as t
@ -26,6 +27,26 @@ from .become import (
)
class CGroupVersion(enum.Enum):
"""The control group version(s) required by a container."""
NONE = 'none'
V1_ONLY = 'v1-only'
V2_ONLY = 'v2-only'
V1_V2 = 'v1-v2'
def __repr__(self) -> str:
return f'{self.__class__.__name__}.{self.name}'
class AuditMode(enum.Enum):
"""The audit requirements of a container."""
NONE = 'none'
REQUIRED = 'required'
def __repr__(self) -> str:
return f'{self.__class__.__name__}.{self.name}'
@dataclasses.dataclass(frozen=True)
class CompletionConfig(metaclass=abc.ABCMeta):
"""Base class for completion configuration."""
@ -140,6 +161,8 @@ class DockerCompletionConfig(PythonCompletionConfig):
"""Configuration for Docker containers."""
image: str = ''
seccomp: str = 'default'
cgroup: str = CGroupVersion.V1_V2.value
audit: str = AuditMode.REQUIRED.value # most containers need this, so the default is required, leaving it to be opt-out for containers which don't need it
placeholder: bool = False
@property
@ -147,6 +170,22 @@ class DockerCompletionConfig(PythonCompletionConfig):
"""True if the completion entry is only used for defaults, otherwise False."""
return False
@property
def audit_enum(self) -> AuditMode:
"""The audit requirements for the container. Raises an exception if the value is invalid."""
try:
return AuditMode(self.audit)
except ValueError:
raise ValueError(f'Docker completion entry "{self.name}" has an invalid value "{self.audit}" for the "audit" setting.') from None
@property
def cgroup_enum(self) -> CGroupVersion:
"""The control group version(s) required by the container. Raises an exception if the value is invalid."""
try:
return CGroupVersion(self.cgroup)
except ValueError:
raise ValueError(f'Docker completion entry "{self.name}" has an invalid value "{self.cgroup}" for the "cgroup" setting.') from None
def __post_init__(self):
if not self.image:
raise Exception(f'Docker completion entry "{self.name}" must provide an "image" setting.')
@ -154,6 +193,10 @@ class DockerCompletionConfig(PythonCompletionConfig):
if not self.supported_pythons and not self.placeholder:
raise Exception(f'Docker completion entry "{self.name}" must provide a "python" setting.')
# verify properties can be correctly parsed to enums
assert self.audit_enum
assert self.cgroup_enum
@dataclasses.dataclass(frozen=True)
class NetworkRemoteCompletionConfig(RemoteCompletionConfig):

@ -111,6 +111,9 @@ class EnvironmentConfig(CommonConfig):
self.delegate_args: list[str] = []
self.dev_systemd_debug: bool = args.dev_systemd_debug
self.dev_probe_cgroups: t.Optional[str] = args.dev_probe_cgroups
def host_callback(files: list[tuple[str, str]]) -> None:
"""Add the host files to the payload file list."""
config = self

@ -6,6 +6,8 @@ from .._util.target.common.constants import (
REMOTE_ONLY_PYTHON_VERSIONS,
)
STATUS_HOST_CONNECTION_ERROR = 4
# Setting a low soft RLIMIT_NOFILE value will improve the performance of subprocess.Popen on Python 2.x when close_fds=True.
# This will affect all Python subprocesses. It will also affect the current Python process if set before subprocess is imported for the first time.
SOFT_RLIMIT_NOFILE = 1024

@ -35,8 +35,10 @@ from .config import (
from .docker_util import (
ContainerNotFoundError,
DockerInspect,
docker_create,
docker_exec,
docker_inspect,
docker_network_inspect,
docker_pull,
docker_rm,
docker_run,
@ -45,6 +47,7 @@ from .docker_util import (
get_docker_host_ip,
get_podman_host_ip,
require_docker,
detect_host_properties,
)
from .ansible_util import (
@ -81,6 +84,10 @@ from .connections import (
SshConnection,
)
from .thread import (
mutex,
)
# information about support containers provisioned by the current ansible-test instance
support_containers: dict[str, ContainerDescriptor] = {}
support_containers_mutex = threading.Lock()
@ -142,7 +149,7 @@ def run_support_container(
options = (options or [])
if start:
options.append('-d')
options.append('-dt') # the -t option is required to cause systemd in the container to log output to the console
if publish_ports:
for port in ports:
@ -152,6 +159,10 @@ def run_support_container(
for key, value in env.items():
options.extend(['--env', '%s=%s' % (key, value)])
max_open_files = detect_host_properties(args).max_open_files
options.extend(['--ulimit', 'nofile=%s' % max_open_files])
support_container_id = None
if allow_existing:
@ -176,6 +187,9 @@ def run_support_container(
if not support_container_id:
docker_rm(args, name)
if args.dev_systemd_debug:
options.extend(('--env', 'SYSTEMD_LOG_LEVEL=debug'))
if support_container_id:
display.info('Using existing "%s" container.' % name)
running = True
@ -183,7 +197,7 @@ def run_support_container(
else:
display.info('Starting new "%s" container.' % name)
docker_pull(args, image)
support_container_id = docker_run(args, image, name, options, create_only=not start, cmd=cmd)
support_container_id = run_container(args, image, name, options, create_only=not start, cmd=cmd)
running = start
existing = False
@ -221,6 +235,126 @@ def run_support_container(
return descriptor
def run_container(
args: EnvironmentConfig,
image: str,
name: str,
options: t.Optional[list[str]],
cmd: t.Optional[list[str]] = None,
create_only: bool = False,
) -> str:
"""Run a container using the given docker image."""
options = list(options or [])
cmd = list(cmd or [])
options.extend(['--name', name])
network = get_docker_preferred_network_name(args)
if is_docker_user_defined_network(network):
# Only when the network is not the default bridge network.
options.extend(['--network', network])
for _iteration in range(1, 3):
try:
if create_only:
stdout = docker_create(args, image, options, cmd)[0]
else:
stdout = docker_run(args, image, options, cmd)[0]
except SubprocessError as ex:
display.error(ex.message)
display.warning('Failed to run docker image "{image}". Waiting a few seconds before trying again.')
docker_rm(args, name) # podman doesn't remove containers after create if run fails
time.sleep(3)
else:
if args.explain:
stdout = ''.join(random.choice('0123456789abcdef') for _iteration in range(64))
return stdout.strip()
raise ApplicationError(f'Failed to run docker image "{image}".')
def start_container(args: EnvironmentConfig, container_id: str) -> tuple[t.Optional[str], t.Optional[str]]:
"""Start a docker container by name or ID."""
options: list[str] = []
for _iteration in range(1, 3):
try:
return docker_start(args, container_id, options)
except SubprocessError as ex:
display.error(ex.message)
display.warning(f'Failed to start docker container "{container_id}". Waiting a few seconds before trying again.')
time.sleep(3)
raise ApplicationError(f'Failed to start docker container "{container_id}".')
def get_container_ip_address(args: EnvironmentConfig, container: DockerInspect) -> t.Optional[str]:
"""Return the IP address of the container for the preferred docker network."""
if container.networks:
network_name = get_docker_preferred_network_name(args)
if not network_name:
# Sort networks and use the first available.
# This assumes all containers will have access to the same networks.
network_name = sorted(container.networks.keys()).pop(0)
ipaddress = container.networks[network_name]['IPAddress']
else:
ipaddress = container.network_settings['IPAddress']
if not ipaddress:
return None
return ipaddress
@mutex
def get_docker_preferred_network_name(args: EnvironmentConfig) -> t.Optional[str]:
"""
Return the preferred network name for use with Docker. The selection logic is:
- the network selected by the user with `--docker-network`
- the network of the currently running docker container (if any)
- the default docker network (returns None)
"""
try:
return get_docker_preferred_network_name.network # type: ignore[attr-defined]
except AttributeError:
pass
network = None
if args.docker_network:
network = args.docker_network
else:
current_container_id = get_docker_container_id()
if current_container_id:
# Make sure any additional containers we launch use the same network as the current container we're running in.
# This is needed when ansible-test is running in a container that is not connected to Docker's default network.
container = docker_inspect(args, current_container_id, always=True)
network = container.get_network_name()
# The default docker behavior puts containers on the same network.
# The default podman behavior puts containers on isolated networks which don't allow communication between containers or network disconnect.
# Starting with podman version 2.1.0 rootless containers are able to join networks.
# Starting with podman version 2.2.0 containers can be disconnected from networks.
# To maintain feature parity with docker, detect and use the default "podman" network when running under podman.
if network is None and require_docker().command == 'podman' and docker_network_inspect(args, 'podman', always=True):
network = 'podman'
get_docker_preferred_network_name.network = network # type: ignore[attr-defined]
return network
def is_docker_user_defined_network(network: str) -> bool:
"""Return True if the network being used is a user-defined network."""
return bool(network) and network != 'bridge'
@mutex
def get_container_database(args: EnvironmentConfig) -> ContainerDatabase:
"""Return the current container database, creating it as needed, or returning the one provided on the command line through delegation."""
try:
@ -572,7 +706,7 @@ class ContainerDescriptor:
def start(self, args: EnvironmentConfig) -> None:
"""Start the container. Used for containers which are created, but not started."""
docker_start(args, self.name)
start_container(args, self.name)
self.register(args)
@ -582,7 +716,7 @@ class ContainerDescriptor:
raise Exception('Container already registered: %s' % self.name)
try:
container = docker_inspect(args, self.container_id)
container = docker_inspect(args, self.name)
except ContainerNotFoundError:
if not args.explain:
raise
@ -599,7 +733,7 @@ class ContainerDescriptor:
),
))
support_container_ip = container.get_ip_address()
support_container_ip = get_container_ip_address(args, container)
if self.publish_ports:
# inspect the support container to locate the published ports
@ -664,7 +798,7 @@ def cleanup_containers(args: EnvironmentConfig) -> None:
if container.cleanup == CleanupMode.YES:
docker_rm(args, container.container_id)
elif container.cleanup == CleanupMode.INFO:
display.notice('Remember to run `docker rm -f %s` when finished testing.' % container.name)
display.notice(f'Remember to run `{require_docker().command} rm -f {container.name}` when finished testing.')
def create_hosts_entries(context: dict[str, ContainerAccess]) -> list[str]:

@ -52,6 +52,10 @@ from .constants import (
CONTROLLER_PYTHON_VERSIONS,
)
from .thread import (
mutex,
)
@dataclasses.dataclass(frozen=True)
class CoverageVersion:
@ -203,6 +207,7 @@ def get_coverage_environment(
return env
@mutex
def get_coverage_config(args: TestConfig) -> str:
"""Return the path to the coverage config, creating the config if it does not already exist."""
try:

@ -8,6 +8,10 @@ import os
import tempfile
import typing as t
from .constants import (
STATUS_HOST_CONNECTION_ERROR,
)
from .locale_util import (
STANDARD_LOCALE,
)
@ -200,6 +204,7 @@ def delegate_command(args: EnvironmentConfig, host_state: HostState, exclude: li
con.user = pytest_user
success = False
status = 0
try:
# When delegating, preserve the original separate stdout/stderr streams, but only when the following conditions are met:
@ -209,10 +214,17 @@ def delegate_command(args: EnvironmentConfig, host_state: HostState, exclude: li
output_stream = OutputStream.ORIGINAL if args.display_stderr and not args.interactive else None
con.run(insert_options(command, options), capture=False, interactive=args.interactive, output_stream=output_stream)
success = True
except SubprocessError as ex:
status = ex.status
raise
finally:
if host_delegation:
download_results(args, con, content_root, success)
if not success and status == STATUS_HOST_CONNECTION_ERROR:
for target in host_state.target_profiles:
target.on_target_failure() # when the controller is delegated, report failures after delegation fails
def insert_options(command, options):
"""Insert addition command line options into the given command and return the result."""

@ -0,0 +1,2 @@
"""Development and testing support code. Enabled through the use of `--dev-*` command line options."""
from __future__ import annotations

@ -0,0 +1,210 @@
"""Diagnostic utilities to probe container cgroup behavior during development and testing (both manual and integration)."""
from __future__ import annotations
import dataclasses
import enum
import json
import os
import pathlib
import pwd
import typing as t
from ..io import (
read_text_file,
write_text_file,
)
from ..util import (
display,
ANSIBLE_TEST_TARGET_ROOT,
)
from ..config import (
EnvironmentConfig,
)
from ..docker_util import (
LOGINUID_NOT_SET,
docker_exec,
get_docker_info,
get_podman_remote,
require_docker,
)
from ..host_configs import (
DockerConfig,
)
from ..cgroup import (
CGroupEntry,
CGroupPath,
MountEntry,
MountType,
)
class CGroupState(enum.Enum):
"""The expected state of a cgroup related mount point."""
HOST = enum.auto()
PRIVATE = enum.auto()
SHADOWED = enum.auto()
@dataclasses.dataclass(frozen=True)
class CGroupMount:
"""Details on a cgroup mount point that is expected to be present in the container."""
path: str
type: t.Optional[str]
writable: t.Optional[bool]
state: t.Optional[CGroupState]
def __post_init__(self):
assert pathlib.PurePosixPath(self.path).is_relative_to(CGroupPath.ROOT)
if self.type is None:
assert self.state is None
elif self.type == MountType.TMPFS:
assert self.writable is True
assert self.state is None
else:
assert self.type in (MountType.CGROUP_V1, MountType.CGROUP_V2)
assert self.state is not None
def check_container_cgroup_status(args: EnvironmentConfig, config: DockerConfig, container_name: str, expected_mounts: tuple[CGroupMount, ...]) -> None:
"""Check the running container to examine the state of the cgroup hierarchies."""
cmd = ['sh', '-c', 'cat /proc/1/cgroup && echo && cat /proc/1/mounts']
stdout = docker_exec(args, container_name, cmd, capture=True)[0]
cgroups_stdout, mounts_stdout = stdout.split('\n\n')
cgroups = CGroupEntry.loads(cgroups_stdout)
mounts = MountEntry.loads(mounts_stdout)
mounts = tuple(mount for mount in mounts if mount.path.is_relative_to(CGroupPath.ROOT))
mount_cgroups: dict[MountEntry, CGroupEntry] = {}
probe_paths: dict[pathlib.PurePosixPath, t.Optional[str]] = {}
for cgroup in cgroups:
if cgroup.subsystem:
mount = ([mount for mount in mounts if
mount.type == MountType.CGROUP_V1 and
mount.path.is_relative_to(cgroup.root_path) and
cgroup.full_path.is_relative_to(mount.path)
] or [None])[-1]
else:
mount = ([mount for mount in mounts if
mount.type == MountType.CGROUP_V2 and
mount.path == cgroup.root_path
] or [None])[-1]
if mount:
mount_cgroups[mount] = cgroup
for mount in mounts:
probe_paths[mount.path] = None
if (cgroup := mount_cgroups.get(mount)) and cgroup.full_path != mount.path: # child of mount.path
probe_paths[cgroup.full_path] = None
probe_script = read_text_file(os.path.join(ANSIBLE_TEST_TARGET_ROOT, 'setup', 'probe_cgroups.py'))
probe_command = [config.python.path, '-', f'{container_name}-probe'] + [str(path) for path in probe_paths]
probe_results = json.loads(docker_exec(args, container_name, probe_command, capture=True, data=probe_script)[0])
for path in probe_paths:
probe_paths[path] = probe_results[str(path)]
remaining_mounts: dict[pathlib.PurePosixPath, MountEntry] = {mount.path: mount for mount in mounts}
results: dict[pathlib.PurePosixPath, tuple[bool, str]] = {}
for expected_mount in expected_mounts:
expected_path = pathlib.PurePosixPath(expected_mount.path)
if not (actual_mount := remaining_mounts.pop(expected_path, None)):
results[expected_path] = (False, 'not mounted')
continue
actual_mount_write_error = probe_paths[actual_mount.path]
actual_mount_errors = []
if cgroup := mount_cgroups.get(actual_mount):
if expected_mount.state == CGroupState.SHADOWED:
actual_mount_errors.append('unexpected cgroup association')
if cgroup.root_path == cgroup.full_path and expected_mount.state == CGroupState.HOST:
results[cgroup.root_path.joinpath('???')] = (False, 'missing cgroup')
if cgroup.full_path == actual_mount.path:
if cgroup.root_path != cgroup.full_path and expected_mount.state == CGroupState.PRIVATE:
actual_mount_errors.append('unexpected mount')
else:
cgroup_write_error = probe_paths[cgroup.full_path]
cgroup_errors = []
if expected_mount.state == CGroupState.SHADOWED:
cgroup_errors.append('unexpected cgroup association')
if cgroup.root_path != cgroup.full_path and expected_mount.state == CGroupState.PRIVATE:
cgroup_errors.append('unexpected cgroup')
if cgroup_write_error:
cgroup_errors.append(cgroup_write_error)
if cgroup_errors:
results[cgroup.full_path] = (False, f'directory errors: {", ".join(cgroup_errors)}')
else:
results[cgroup.full_path] = (True, 'directory (writable)')
elif expected_mount.state not in (None, CGroupState.SHADOWED):
actual_mount_errors.append('missing cgroup association')
if actual_mount.type != expected_mount.type and expected_mount.type is not None:
actual_mount_errors.append(f'type not {expected_mount.type}')
if bool(actual_mount_write_error) == expected_mount.writable:
actual_mount_errors.append(f'{actual_mount_write_error or "writable"}')
if actual_mount_errors:
results[actual_mount.path] = (False, f'{actual_mount.type} errors: {", ".join(actual_mount_errors)}')
else:
results[actual_mount.path] = (True, f'{actual_mount.type} ({actual_mount_write_error or "writable"})')
for remaining_mount in remaining_mounts.values():
remaining_mount_write_error = probe_paths[remaining_mount.path]
results[remaining_mount.path] = (False, f'unexpected {remaining_mount.type} mount ({remaining_mount_write_error or "writable"})')
identity = get_identity(args, config, container_name)
messages: list[tuple[pathlib.PurePosixPath, bool, str]] = [(path, result[0], result[1]) for path, result in sorted(results.items())]
message = '\n'.join(f'{"PASS" if result else "FAIL"}: {path} -> {message}' for path, result, message in messages)
display.info(f'>>> Container: {identity}\n{message.rstrip()}')
if args.dev_probe_cgroups:
write_text_file(os.path.join(args.dev_probe_cgroups, f'{identity}.log'), message)
def get_identity(args: EnvironmentConfig, config: DockerConfig, container_name: str):
"""Generate and return an identity string to use when logging test results."""
engine = require_docker().command
try:
loginuid = int(read_text_file('/proc/self/loginuid'))
except FileNotFoundError:
loginuid = LOGINUID_NOT_SET
user = pwd.getpwuid(os.getuid()).pw_name
login_user = user if loginuid == LOGINUID_NOT_SET else pwd.getpwuid(loginuid).pw_name
remote = engine == 'podman' and get_podman_remote()
tags = (
config.name,
engine,
f'cgroup={config.cgroup.value}@{get_docker_info(args).cgroup_version}',
f'remote={remote}',
f'user={user}',
f'loginuid={login_user}',
container_name,
)
return '|'.join(tags)

@ -1,9 +1,11 @@
"""Functions for accessing docker via the docker cli."""
from __future__ import annotations
import dataclasses
import enum
import json
import os
import random
import pathlib
import socket
import time
import urllib.parse
@ -30,7 +32,17 @@ from .util_common import (
from .config import (
CommonConfig,
EnvironmentConfig,
)
from .thread import (
mutex,
named_lock,
)
from .cgroup import (
CGroupEntry,
MountEntry,
MountType,
)
DOCKER_COMMANDS = [
@ -38,10 +50,373 @@ DOCKER_COMMANDS = [
'podman',
]
UTILITY_IMAGE = 'quay.io/ansible/ansible-test-utility-container:2.0.0'
# Max number of open files in a docker container.
# Passed with --ulimit option to the docker run command.
MAX_NUM_OPEN_FILES = 10240
# The value of /proc/*/loginuid when it is not set.
# It is a reserved UID, which is the maximum 32-bit unsigned integer value.
# See: https://access.redhat.com/solutions/25404
LOGINUID_NOT_SET = 4294967295
class DockerInfo:
"""The results of `docker info` and `docker version` for the container runtime."""
@classmethod
def init(cls, args: CommonConfig) -> DockerInfo:
"""Initialize and return a DockerInfo instance."""
command = require_docker().command
info_stdout = docker_command(args, ['info', '--format', '{{ json . }}'], capture=True, always=True)[0]
info = json.loads(info_stdout)
if server_errors := info.get('ServerErrors'):
# This can occur when a remote docker instance is in use and the instance is not responding, such as when the system is still starting up.
# In that case an error such as the following may be returned:
# error during connect: Get "http://{hostname}:2375/v1.24/info": dial tcp {ip_address}:2375: connect: no route to host
raise ApplicationError('Unable to get container host information: ' + '\n'.join(server_errors))
version_stdout = docker_command(args, ['version', '--format', '{{ json . }}'], capture=True, always=True)[0]
version = json.loads(version_stdout)
info = DockerInfo(args, command, info, version)
return info
def __init__(self, args: CommonConfig, engine: str, info: dict[str, t.Any], version: dict[str, t.Any]) -> None:
self.args = args
self.engine = engine
self.info = info
self.version = version
@property
def client(self) -> dict[str, t.Any]:
"""The client version details."""
client = self.version.get('Client')
if not client:
raise ApplicationError('Unable to get container host client information.')
return client
@property
def server(self) -> dict[str, t.Any]:
"""The server version details."""
server = self.version.get('Server')
if not server:
if self.engine == 'podman':
# Some Podman versions always report server version info (verified with 1.8.0 and 1.9.3).
# Others do not unless Podman remote is being used.
# To provide consistency, use the client version if the server version isn't provided.
# See: https://github.com/containers/podman/issues/2671#issuecomment-804382934
return self.client
raise ApplicationError('Unable to get container host server information.')
return server
@property
def client_version(self) -> str:
"""The client version."""
return self.client['Version']
@property
def server_version(self) -> str:
"""The server version."""
return self.server['Version']
@property
def client_major_minor_version(self) -> tuple[int, int]:
"""The client major and minor version."""
major, minor = self.client_version.split('.')[:2]
return int(major), int(minor)
@property
def server_major_minor_version(self) -> tuple[int, int]:
"""The server major and minor version."""
major, minor = self.server_version.split('.')[:2]
return int(major), int(minor)
@property
def cgroupns_option_supported(self) -> bool:
"""Return True if the `--cgroupns` option is supported, otherwise return False."""
if self.engine == 'docker':
# Docker added support for the `--cgroupns` option in version 20.10.
# Both the client and server must support the option to use it.
# See: https://docs.docker.com/engine/release-notes/#20100
return self.client_major_minor_version >= (20, 10) and self.server_major_minor_version >= (20, 10)
raise NotImplementedError(self.engine)
@property
def cgroup_version(self) -> int:
"""The cgroup version of the container host."""
info = self.info
host = info.get('host')
# When the container host reports cgroup v1 it is running either cgroup v1 legacy mode or cgroup v2 hybrid mode.
# When the container host reports cgroup v2 it is running under cgroup v2 unified mode.
# See: https://github.com/containers/podman/blob/8356621249e36ed62fc7f35f12d17db9027ff076/libpod/info_linux.go#L52-L56
# See: https://github.com/moby/moby/blob/d082bbcc0557ec667faca81b8b33bec380b75dac/daemon/info_unix.go#L24-L27
if host:
return int(host['cgroupVersion'].lstrip('v')) # podman
try:
return int(info['CgroupVersion']) # docker
except KeyError:
pass
# Docker 20.10 (API version 1.41) added support for cgroup v2.
# Unfortunately the client or server is too old to report the cgroup version.
# If the server is old, we can infer the cgroup version.
# Otherwise, we'll need to fall back to detection.
# See: https://docs.docker.com/engine/release-notes/#20100
# See: https://docs.docker.com/engine/api/version-history/#v141-api-changes
if self.server_major_minor_version < (20, 10):
return 1 # old docker server with only cgroup v1 support
# Tell the user what versions they have and recommend they upgrade the client.
# Downgrading the server should also work, but we won't mention that.
message = (
f'The Docker client version is {self.client_version}. '
f'The Docker server version is {self.server_version}. '
'Upgrade your Docker client to version 20.10 or later.'
)
if detect_host_properties(self.args).cgroup_v2:
# Unfortunately cgroup v2 was detected on the Docker server.
# A newer client is needed to support the `--cgroupns` option for use with cgroup v2.
raise ApplicationError(f'Unsupported Docker client and server combination using cgroup v2. {message}')
display.warning(f'Detected Docker server cgroup v1 using probing. {message}', unique=True)
return 1 # docker server is using cgroup v1 (or cgroup v2 hybrid)
@property
def docker_desktop_wsl2(self) -> bool:
"""Return True if Docker Desktop integrated with WSL2 is detected, otherwise False."""
info = self.info
kernel_version = info.get('KernelVersion')
operating_system = info.get('OperatingSystem')
dd_wsl2 = kernel_version and kernel_version.endswith('-WSL2') and operating_system == 'Docker Desktop'
return dd_wsl2
@property
def description(self) -> str:
"""Describe the container runtime."""
tags = dict(
client=self.client_version,
server=self.server_version,
cgroup=f'v{self.cgroup_version}',
)
labels = [self.engine] + [f'{key}={value}' for key, value in tags.items()]
if self.docker_desktop_wsl2:
labels.append('DD+WSL2')
return f'Container runtime: {" ".join(labels)}'
@mutex
def get_docker_info(args: CommonConfig) -> DockerInfo:
"""Return info for the current container runtime. The results are cached."""
try:
return get_docker_info.info # type: ignore[attr-defined]
except AttributeError:
pass
info = DockerInfo.init(args)
display.info(info.description, verbosity=1)
get_docker_info.info = info # type: ignore[attr-defined]
return info
class SystemdControlGroupV1Status(enum.Enum):
"""The state of the cgroup v1 systemd hierarchy on the container host."""
SUBSYSTEM_MISSING = 'The systemd cgroup subsystem was not found.'
FILESYSTEM_NOT_MOUNTED = 'The "/sys/fs/cgroup/systemd" filesystem is not mounted.'
MOUNT_TYPE_NOT_CORRECT = 'The "/sys/fs/cgroup/systemd" mount type is not correct.'
VALID = 'The "/sys/fs/cgroup/systemd" mount is valid.'
@dataclasses.dataclass(frozen=True)
class ContainerHostProperties:
"""Container host properties detected at run time."""
audit_code: str
max_open_files: int
loginuid: t.Optional[int]
cgroups: tuple[CGroupEntry, ...]
mounts: tuple[MountEntry, ...]
cgroup_v1: SystemdControlGroupV1Status
cgroup_v2: bool
@mutex
def detect_host_properties(args: CommonConfig) -> ContainerHostProperties:
"""
Detect and return properties of the container host.
The information collected is:
- The errno result from attempting to query the container host's audit status.
- The max number of open files supported by the container host to run containers.
This value may be capped to the maximum value used by ansible-test.
If the value is below the desired limit, a warning is displayed.
- The loginuid used by the container host to run containers, or None if the audit subsystem is unavailable.
- The cgroup subsystems registered with the Linux kernel.
- The mounts visible within a container.
- The status of the systemd cgroup v1 hierarchy.
This information is collected together to reduce the number of container runs to probe the container host.
"""
try:
return detect_host_properties.properties # type: ignore[attr-defined]
except AttributeError:
pass
single_line_commands = (
'audit-status',
'cat /proc/sys/fs/nr_open',
'ulimit -Hn',
'(cat /proc/1/loginuid; echo)',
)
multi_line_commands = (
' && '.join(single_line_commands),
'cat /proc/1/cgroup',
'cat /proc/1/mounts',
)
options = ['--volume', '/sys/fs/cgroup:/probe:ro']
cmd = ['sh', '-c', ' && echo "-" && '.join(multi_line_commands)]
stdout = run_utility_container(args, f'ansible-test-probe-{args.session_name}', cmd, options)[0]
blocks = stdout.split('\n-\n')
values = blocks[0].split('\n')
audit_parts = values[0].split(' ', 1)
audit_status = int(audit_parts[0])
audit_code = audit_parts[1]
system_limit = int(values[1])
hard_limit = int(values[2])
loginuid = int(values[3]) if values[3] else None
cgroups = CGroupEntry.loads(blocks[1])
mounts = MountEntry.loads(blocks[2])
if hard_limit < MAX_NUM_OPEN_FILES and hard_limit < system_limit and require_docker().command == 'docker':
# Podman will use the highest possible limits, up to its default of 1M.
# See: https://github.com/containers/podman/blob/009afb50b308548eb129bc68e654db6c6ad82e7a/pkg/specgen/generate/oci.go#L39-L58
# Docker limits are less predictable. They could be the system limit or the user's soft limit.
# If Docker is running as root it should be able to use the system limit.
# When Docker reports a limit below the preferred value and the system limit, attempt to use the preferred value, up to the system limit.
options = ['--ulimit', f'nofile={min(system_limit, MAX_NUM_OPEN_FILES)}']
cmd = ['sh', '-c', 'ulimit -Hn']
try:
stdout = run_utility_container(args, f'ansible-test-ulimit-{args.session_name}', cmd, options)[0]
except SubprocessError as ex:
display.warning(str(ex))
else:
hard_limit = int(stdout)
# Check the audit error code from attempting to query the container host's audit status.
#
# The following error codes are known to occur:
#
# EPERM - Operation not permitted
# This occurs when the root user runs a container but lacks the AUDIT_WRITE capability.
# This will cause patched versions of OpenSSH to disconnect after a login succeeds.
# See: https://src.fedoraproject.org/rpms/openssh/blob/f36/f/openssh-7.6p1-audit.patch
#
# EBADF - Bad file number
# This occurs when the host doesn't support the audit system (the open_audit call fails).
# This allows SSH logins to succeed despite the failure.
# See: https://github.com/Distrotech/libaudit/blob/4fc64f79c2a7f36e3ab7b943ce33ab5b013a7782/lib/netlink.c#L204-L209
#
# ECONNREFUSED - Connection refused
# This occurs when a non-root user runs a container without the AUDIT_WRITE capability.
# When sending an audit message, libaudit ignores this error condition.
# This allows SSH logins to succeed despite the failure.
# See: https://github.com/Distrotech/libaudit/blob/4fc64f79c2a7f36e3ab7b943ce33ab5b013a7782/lib/deprecated.c#L48-L52
subsystems = set(cgroup.subsystem for cgroup in cgroups)
mount_types = {mount.path: mount.type for mount in mounts}
if 'systemd' not in subsystems:
cgroup_v1 = SystemdControlGroupV1Status.SUBSYSTEM_MISSING
elif not (mount_type := mount_types.get(pathlib.PurePosixPath('/probe/systemd'))):
cgroup_v1 = SystemdControlGroupV1Status.FILESYSTEM_NOT_MOUNTED
elif mount_type != MountType.CGROUP_V1:
cgroup_v1 = SystemdControlGroupV1Status.MOUNT_TYPE_NOT_CORRECT
else:
cgroup_v1 = SystemdControlGroupV1Status.VALID
cgroup_v2 = mount_types.get(pathlib.PurePosixPath('/probe')) == MountType.CGROUP_V2
display.info(f'Container host audit status: {audit_code} ({audit_status})', verbosity=1)
display.info(f'Container host max open files: {hard_limit}', verbosity=1)
display.info(f'Container loginuid: {loginuid if loginuid is not None else "unavailable"}'
f'{" (not set)" if loginuid == LOGINUID_NOT_SET else ""}', verbosity=1)
if hard_limit < MAX_NUM_OPEN_FILES:
display.warning(f'Unable to set container max open files to {MAX_NUM_OPEN_FILES}. Using container host limit of {hard_limit} instead.')
else:
hard_limit = MAX_NUM_OPEN_FILES
properties = ContainerHostProperties(
# The errno (audit_status) is intentionally not exposed here, as it can vary across systems and architectures.
# Instead, the symbolic name (audit_code) is used, which is resolved inside the container which generated the error.
# See: https://man7.org/linux/man-pages/man3/errno.3.html
audit_code=audit_code,
max_open_files=hard_limit,
loginuid=loginuid,
cgroups=cgroups,
mounts=mounts,
cgroup_v1=cgroup_v1,
cgroup_v2=cgroup_v2,
)
detect_host_properties.properties = properties # type: ignore[attr-defined]
return properties
def run_utility_container(
args: CommonConfig,
name: str,
cmd: list[str],
options: list[str],
data: t.Optional[str] = None,
) -> tuple[t.Optional[str], t.Optional[str]]:
"""Run the specified command using the ansible-test utility container, returning stdout and stderr."""
options = options + [
'--name', name,
'--rm',
]
if data:
options.append('-i')
docker_pull(args, UTILITY_IMAGE)
return docker_run(args, UTILITY_IMAGE, options, cmd, data)
class DockerCommand:
"""Details about the available docker command."""
@ -62,7 +437,7 @@ class DockerCommand:
executable = find_executable(command, required=False)
if executable:
version = raw_command([command, '-v'], capture=True)[0].strip()
version = raw_command([command, '-v'], env=docker_environment(), capture=True)[0].strip()
if command == 'docker' and 'podman' in version:
continue # avoid detecting podman as docker
@ -141,7 +516,7 @@ def get_podman_default_hostname() -> t.Optional[str]:
"""
hostname: t.Optional[str] = None
try:
stdout = raw_command(['podman', 'system', 'connection', 'list', '--format=json'], capture=True)[0]
stdout = raw_command(['podman', 'system', 'connection', 'list', '--format=json'], env=docker_environment(), capture=True)[0]
except SubprocessError:
stdout = '[]'
@ -160,7 +535,8 @@ def get_podman_default_hostname() -> t.Optional[str]:
@cache
def _get_podman_remote() -> t.Optional[str]:
def get_podman_remote() -> t.Optional[str]:
"""Return the remote podman hostname, if any, otherwise return None."""
# URL value resolution precedence:
# - command line value
# - environment variable CONTAINER_HOST
@ -185,7 +561,7 @@ def _get_podman_remote() -> t.Optional[str]:
@cache
def get_podman_hostname() -> str:
"""Return the hostname of the Podman service."""
hostname = _get_podman_remote()
hostname = get_podman_remote()
if not hostname:
hostname = 'localhost'
@ -219,142 +595,96 @@ def get_docker_container_id() -> t.Optional[str]:
return container_id
def get_docker_preferred_network_name(args: EnvironmentConfig) -> str:
"""
Return the preferred network name for use with Docker. The selection logic is:
- the network selected by the user with `--docker-network`
- the network of the currently running docker container (if any)
- the default docker network (returns None)
"""
try:
return get_docker_preferred_network_name.network # type: ignore[attr-defined]
except AttributeError:
pass
network = None
if args.docker_network:
network = args.docker_network
else:
current_container_id = get_docker_container_id()
if current_container_id:
# Make sure any additional containers we launch use the same network as the current container we're running in.
# This is needed when ansible-test is running in a container that is not connected to Docker's default network.
container = docker_inspect(args, current_container_id, always=True)
network = container.get_network_name()
get_docker_preferred_network_name.network = network # type: ignore[attr-defined]
return network
def is_docker_user_defined_network(network: str) -> bool:
"""Return True if the network being used is a user-defined network."""
return bool(network) and network != 'bridge'
def docker_pull(args: EnvironmentConfig, image: str) -> None:
def docker_pull(args: CommonConfig, image: str) -> None:
"""
Pull the specified image if it is not available.
Images without a tag or digest will not be pulled.
Retries up to 10 times if the pull fails.
A warning will be shown for any image with volumes defined.
Images will be pulled only once.
Concurrent pulls for the same image will block until the first completes.
"""
with named_lock(f'docker_pull:{image}') as first:
if first:
__docker_pull(args, image)
def __docker_pull(args: CommonConfig, image: str) -> None:
"""Internal implementation for docker_pull. Do not call directly."""
if '@' not in image and ':' not in image:
display.info('Skipping pull of image without tag or digest: %s' % image, verbosity=2)
return
if docker_image_exists(args, image):
inspect = docker_image_inspect(args, image)
elif inspect := docker_image_inspect(args, image, always=True):
display.info('Skipping pull of existing image: %s' % image, verbosity=2)
return
else:
for _iteration in range(1, 10):
try:
docker_command(args, ['pull', image], capture=False)
for _iteration in range(1, 10):
try:
docker_command(args, ['pull', image], capture=False)
return
except SubprocessError:
display.warning('Failed to pull docker image "%s". Waiting a few seconds before trying again.' % image)
time.sleep(3)
if (inspect := docker_image_inspect(args, image)) or args.explain:
break
display.warning(f'Image "{image}" not found after pull completed. Waiting a few seconds before trying again.')
except SubprocessError:
display.warning(f'Failed to pull container image "{image}". Waiting a few seconds before trying again.')
time.sleep(3)
else:
raise ApplicationError(f'Failed to pull container image "{image}".')
raise ApplicationError('Failed to pull docker image "%s".' % image)
if inspect and inspect.volumes:
display.warning(f'Image "{image}" contains {len(inspect.volumes)} volume(s): {", ".join(sorted(inspect.volumes))}\n'
'This may result in leaking anonymous volumes. It may also prevent the image from working on some hosts or container engines.\n'
'The image should be rebuilt without the use of the VOLUME instruction.',
unique=True)
def docker_cp_to(args: EnvironmentConfig, container_id: str, src: str, dst: str) -> None:
def docker_cp_to(args: CommonConfig, container_id: str, src: str, dst: str) -> None:
"""Copy a file to the specified container."""
docker_command(args, ['cp', src, '%s:%s' % (container_id, dst)], capture=True)
def docker_run(
args: EnvironmentConfig,
def docker_create(
args: CommonConfig,
image: str,
name: str,
options: t.Optional[list[str]],
cmd: t.Optional[list[str]] = None,
create_only: bool = False,
) -> str:
"""Run a container using the given docker image."""
options = list(options or [])
options.extend(['--name', name])
if not cmd:
cmd = []
if create_only:
command = 'create'
else:
command = 'run'
network = get_docker_preferred_network_name(args)
if is_docker_user_defined_network(network):
# Only when the network is not the default bridge network.
options.extend(['--network', network])
options.extend(['--ulimit', 'nofile=%s' % MAX_NUM_OPEN_FILES])
for _iteration in range(1, 3):
try:
stdout = docker_command(args, [command] + options + [image] + cmd, capture=True)[0]
if args.explain:
return ''.join(random.choice('0123456789abcdef') for _iteration in range(64))
return stdout.strip()
except SubprocessError as ex:
display.error(ex.message)
display.warning('Failed to run docker image "%s". Waiting a few seconds before trying again.' % image)
docker_rm(args, name) # podman doesn't remove containers after create if run fails
time.sleep(3)
raise ApplicationError('Failed to run docker image "%s".' % image)
options: list[str],
cmd: list[str] = None,
) -> tuple[t.Optional[str], t.Optional[str]]:
"""Create a container using the given docker image."""
return docker_command(args, ['create'] + options + [image] + cmd, capture=True)
def docker_start(args: EnvironmentConfig, container_id: str, options: t.Optional[list[str]] = None) -> tuple[t.Optional[str], t.Optional[str]]:
"""
Start a docker container by name or ID
"""
if not options:
options = []
def docker_run(
args: CommonConfig,
image: str,
options: list[str],
cmd: list[str] = None,
data: t.Optional[str] = None,
) -> tuple[t.Optional[str], t.Optional[str]]:
"""Run a container using the given docker image."""
return docker_command(args, ['run'] + options + [image] + cmd, data=data, capture=True)
for _iteration in range(1, 3):
try:
return docker_command(args, ['start'] + options + [container_id], capture=True)
except SubprocessError as ex:
display.error(ex.message)
display.warning('Failed to start docker container "%s". Waiting a few seconds before trying again.' % container_id)
time.sleep(3)
raise ApplicationError('Failed to run docker container "%s".' % container_id)
def docker_start(
args: CommonConfig,
container_id: str,
options: list[str],
) -> tuple[t.Optional[str], t.Optional[str]]:
"""Start a container by name or ID."""
return docker_command(args, ['start'] + options + [container_id], capture=True)
def docker_rm(args: EnvironmentConfig, container_id: str) -> None:
def docker_rm(args: CommonConfig, container_id: str) -> None:
"""Remove the specified container."""
try:
docker_command(args, ['rm', '-f', container_id], capture=True)
# Stop the container with SIGKILL immediately, then remove the container.
# Podman supports the `--time` option on `rm`, but only since version 4.0.0.
# Docker does not support the `--time` option on `rm`.
docker_command(args, ['stop', '--time', '0', container_id], capture=True)
docker_command(args, ['rm', container_id], capture=True)
except SubprocessError as ex:
if 'no such container' in ex.stderr:
pass # podman does not handle this gracefully, exits 1
else:
# Both Podman and Docker report an error if the container does not exist.
# The error messages contain the same "no such container" string, differing only in capitalization.
if 'no such container' not in ex.stderr.lower():
raise ex
@ -372,7 +702,7 @@ class ContainerNotFoundError(DockerError):
class DockerInspect:
"""The results of `docker inspect` for a single container."""
def __init__(self, args: EnvironmentConfig, inspection: dict[str, t.Any]) -> None:
def __init__(self, args: CommonConfig, inspection: dict[str, t.Any]) -> None:
self.args = args
self.inspection = inspection
@ -415,6 +745,11 @@ class DockerInspect:
"""Return True if the container is running, otherwise False."""
return self.state['Running']
@property
def pid(self) -> int:
"""Return the PID of the init process."""
return self.state['Pid']
@property
def env(self) -> list[str]:
"""Return a list of the environment variables used to create the container."""
@ -454,27 +789,8 @@ class DockerInspect:
return networks[0]
def get_ip_address(self) -> t.Optional[str]:
"""Return the IP address of the container for the preferred docker network."""
if self.networks:
network_name = get_docker_preferred_network_name(self.args)
if not network_name:
# Sort networks and use the first available.
# This assumes all containers will have access to the same networks.
network_name = sorted(self.networks.keys()).pop(0)
ipaddress = self.networks[network_name]['IPAddress']
else:
ipaddress = self.network_settings['IPAddress']
if not ipaddress:
return None
return ipaddress
def docker_inspect(args: EnvironmentConfig, identifier: str, always: bool = False) -> DockerInspect:
def docker_inspect(args: CommonConfig, identifier: str, always: bool = False) -> DockerInspect:
"""
Return the results of `docker container inspect` for the specified container.
Raises a ContainerNotFoundError if the container was not found.
@ -495,23 +811,110 @@ def docker_inspect(args: EnvironmentConfig, identifier: str, always: bool = Fals
raise ContainerNotFoundError(identifier)
def docker_network_disconnect(args: EnvironmentConfig, container_id: str, network: str) -> None:
def docker_network_disconnect(args: CommonConfig, container_id: str, network: str) -> None:
"""Disconnect the specified docker container from the given network."""
docker_command(args, ['network', 'disconnect', network, container_id], capture=True)
def docker_image_exists(args: EnvironmentConfig, image: str) -> bool:
"""Return True if the image exists, otherwise False."""
class DockerImageInspect:
"""The results of `docker image inspect` for a single image."""
def __init__(self, args: CommonConfig, inspection: dict[str, t.Any]) -> None:
self.args = args
self.inspection = inspection
# primary properties
@property
def config(self) -> dict[str, t.Any]:
"""Return a dictionary of the image config."""
return self.inspection['Config']
# nested properties
@property
def volumes(self) -> dict[str, t.Any]:
"""Return a dictionary of the image volumes."""
return self.config.get('Volumes') or {}
@property
def cmd(self) -> list[str]:
"""The command to run when the container starts."""
return self.config['Cmd']
@mutex
def docker_image_inspect(args: CommonConfig, image: str, always: bool = False) -> t.Optional[DockerImageInspect]:
"""
Return the results of `docker image inspect` for the specified image or None if the image does not exist.
"""
inspect_cache: dict[str, DockerImageInspect]
try:
inspect_cache = docker_image_inspect.cache # type: ignore[attr-defined]
except AttributeError:
inspect_cache = docker_image_inspect.cache = {} # type: ignore[attr-defined]
if inspect_result := inspect_cache.get(image):
return inspect_result
try:
stdout = docker_command(args, ['image', 'inspect', image], capture=True, always=always)[0]
except SubprocessError:
stdout = '[]'
if args.explain and not always:
items = []
else:
items = json.loads(stdout)
if len(items) > 1:
raise ApplicationError(f'Inspection of image "{image}" resulted in {len(items)} items:\n{json.dumps(items, indent=4)}')
if len(items) == 1:
inspect_result = DockerImageInspect(args, items[0])
inspect_cache[image] = inspect_result
return inspect_result
return None
class DockerNetworkInspect:
"""The results of `docker network inspect` for a single network."""
def __init__(self, args: CommonConfig, inspection: dict[str, t.Any]) -> None:
self.args = args
self.inspection = inspection
def docker_network_inspect(args: CommonConfig, network: str, always: bool = False) -> t.Optional[DockerNetworkInspect]:
"""
Return the results of `docker network inspect` for the specified network or None if the network does not exist.
"""
try:
docker_command(args, ['image', 'inspect', image], capture=True)
stdout = docker_command(args, ['network', 'inspect', network], capture=True, always=always)[0]
except SubprocessError:
return False
stdout = '[]'
if args.explain and not always:
items = []
else:
items = json.loads(stdout)
return True
if len(items) == 1:
return DockerNetworkInspect(args, items[0])
return None
def docker_logs(args: CommonConfig, container_id: str) -> None:
"""Display logs for the specified container. If an error occurs, it is displayed rather than raising an exception."""
try:
docker_command(args, ['logs', container_id], capture=False)
except SubprocessError as ex:
display.error(str(ex))
def docker_exec(
args: EnvironmentConfig,
args: CommonConfig,
container_id: str,
cmd: list[str],
capture: bool,
@ -533,18 +936,6 @@ def docker_exec(
output_stream=output_stream, data=data)
def docker_info(args: CommonConfig) -> dict[str, t.Any]:
"""Return a dictionary containing details from the `docker info` command."""
stdout, _dummy = docker_command(args, ['info', '--format', '{{json .}}'], capture=True, always=True)
return json.loads(stdout)
def docker_version(args: CommonConfig) -> dict[str, t.Any]:
"""Return a dictionary containing details from the `docker version` command."""
stdout, _dummy = docker_command(args, ['version', '--format', '{{json .}}'], capture=True, always=True)
return json.loads(stdout)
def docker_command(
args: CommonConfig,
cmd: list[str],
@ -560,7 +951,7 @@ def docker_command(
env = docker_environment()
command = [require_docker().command]
if command[0] == 'podman' and _get_podman_remote():
if command[0] == 'podman' and get_podman_remote():
command.append('--remote')
return run_command(args, command + cmd, env=env, capture=capture, stdin=stdin, stdout=stdout, interactive=interactive, always=always,
@ -570,5 +961,16 @@ def docker_command(
def docker_environment() -> dict[str, str]:
"""Return a dictionary of docker related environment variables found in the current environment."""
env = common_environment()
env.update(dict((key, os.environ[key]) for key in os.environ if key.startswith('DOCKER_') or key.startswith('CONTAINER_')))
var_names = {
'XDG_RUNTIME_DIR', # podman
}
var_prefixes = {
'CONTAINER_', # podman remote
'DOCKER_', # docker
}
env.update({name: value for name, value in os.environ.items() if name in var_names or any(name.startswith(prefix) for prefix in var_prefixes)})
return env

@ -18,6 +18,8 @@ from .io import (
)
from .completion import (
AuditMode,
CGroupVersion,
CompletionConfig,
docker_completion,
DockerCompletionConfig,
@ -282,6 +284,8 @@ class DockerConfig(ControllerHostConfig, PosixConfig):
memory: t.Optional[int] = None
privileged: t.Optional[bool] = None
seccomp: t.Optional[str] = None
cgroup: t.Optional[CGroupVersion] = None
audit: t.Optional[AuditMode] = None
def get_defaults(self, context: HostContext) -> DockerCompletionConfig:
"""Return the default settings."""
@ -313,6 +317,12 @@ class DockerConfig(ControllerHostConfig, PosixConfig):
if self.seccomp is None:
self.seccomp = defaults.seccomp
if self.cgroup is None:
self.cgroup = defaults.cgroup_enum
if self.audit is None:
self.audit = defaults.audit_enum
if self.privileged is None:
self.privileged = False

@ -4,11 +4,13 @@ from __future__ import annotations
import abc
import dataclasses
import os
import shlex
import tempfile
import time
import typing as t
from .io import (
read_text_file,
write_text_file,
)
@ -52,16 +54,28 @@ from .util import (
sanitize_host_name,
sorted_versions,
InternalError,
HostConnectionError,
ANSIBLE_TEST_TARGET_ROOT,
)
from .util_common import (
get_docs_url,
intercept_python,
)
from .docker_util import (
docker_exec,
docker_image_inspect,
docker_logs,
docker_pull,
docker_rm,
get_docker_hostname,
require_docker,
get_docker_info,
detect_host_properties,
run_utility_container,
SystemdControlGroupV1Status,
LOGINUID_NOT_SET,
)
from .bootstrap import (
@ -103,12 +117,66 @@ from .become import (
Sudo,
)
from .completion import (
AuditMode,
CGroupVersion,
)
from .dev.container_probe import (
CGroupMount,
CGroupPath,
CGroupState,
MountType,
check_container_cgroup_status,
)
TControllerHostConfig = t.TypeVar('TControllerHostConfig', bound=ControllerHostConfig)
THostConfig = t.TypeVar('THostConfig', bound=HostConfig)
TPosixConfig = t.TypeVar('TPosixConfig', bound=PosixConfig)
TRemoteConfig = t.TypeVar('TRemoteConfig', bound=RemoteConfig)
class ControlGroupError(ApplicationError):
"""Raised when the container host does not have the necessary cgroup support to run a container."""
def __init__(self, args: CommonConfig, reason: str) -> None:
engine = require_docker().command
dd_wsl2 = get_docker_info(args).docker_desktop_wsl2
message = f'''
{reason}
Run the following commands as root on the container host to resolve this issue:
mkdir /sys/fs/cgroup/systemd
mount cgroup -t cgroup /sys/fs/cgroup/systemd -o none,name=systemd,xattr
chown -R {{user}}:{{group}} /sys/fs/cgroup/systemd # only when rootless
NOTE: These changes must be applied each time the container host is rebooted.
'''.strip()
podman_message = '''
If rootless Podman is already running [1], you may need to stop it before
containers are able to use the new mount point.
[1] Check for 'podman' and 'catatonit' processes.
'''
dd_wsl_message = f'''
When using Docker Desktop with WSL2, additional configuration [1] is required.
[1] {get_docs_url("https://docs.ansible.com/ansible-core/devel/dev_guide/testing_running_locally.html#docker-desktop-with-wsl2")}
'''
if engine == 'podman':
message += podman_message
elif dd_wsl2:
message += dd_wsl_message
message = message.strip()
super().__init__(message)
@dataclasses.dataclass(frozen=True)
class Inventory:
"""Simple representation of an Ansible inventory."""
@ -179,6 +247,9 @@ class HostProfile(t.Generic[THostConfig], metaclass=abc.ABCMeta):
def setup(self) -> None:
"""Perform out-of-band setup before delegation."""
def on_target_failure(self) -> None:
"""Executed during failure handling if this profile is a target."""
def deprovision(self) -> None:
"""Deprovision the host after delegation has completed."""
@ -331,6 +402,16 @@ class ControllerProfile(SshTargetHostProfile[ControllerConfig], PosixProfile[Con
class DockerProfile(ControllerHostProfile[DockerConfig], SshTargetHostProfile[DockerConfig]):
"""Host profile for a docker instance."""
MARKER = 'ansible-test-marker'
@dataclasses.dataclass(frozen=True)
class InitConfig:
"""Configuration details required to run the container init."""
options: list[str]
command: str
expected_mounts: tuple[CGroupMount, ...]
@property
def container_name(self) -> t.Optional[str]:
"""Return the stored container name, if any, otherwise None."""
@ -341,17 +422,36 @@ class DockerProfile(ControllerHostProfile[DockerConfig], SshTargetHostProfile[Do
"""Store the given container name."""
self.state['container_name'] = value
@property
def cgroup_path(self) -> t.Optional[str]:
"""Return the path to the cgroup v1 systemd hierarchy, if any, otherwise None."""
return self.state.get('cgroup_path')
@cgroup_path.setter
def cgroup_path(self, value: str) -> None:
"""Store the path to the cgroup v1 systemd hierarchy."""
self.state['cgroup_path'] = value
@property
def label(self) -> str:
"""Label to apply to resources related to this profile."""
return f'{"controller" if self.controller else "target"}-{self.args.session_name}'
def provision(self) -> None:
"""Provision the host before delegation."""
init_probe = self.args.dev_probe_cgroups is not None
init_config = self.get_init_config()
container = run_support_container(
args=self.args,
context='__test_hosts__',
image=self.config.image,
name=f'ansible-test-{"controller" if self.controller else "target"}-{self.args.session_name}',
name=f'ansible-test-{self.label}',
ports=[22],
publish_ports=not self.controller, # connections to the controller over SSH are not required
options=self.get_docker_run_options(),
options=init_config.options,
cleanup=CleanupMode.NO,
cmd=self.build_sleep_command() if init_config.command or init_probe else None,
)
if not container:
@ -359,6 +459,458 @@ class DockerProfile(ControllerHostProfile[DockerConfig], SshTargetHostProfile[Do
self.container_name = container.name
try:
options = ['--pid', 'host', '--privileged']
if init_config.command:
init_command = init_config.command
if not init_probe:
init_command += f' && {shlex.join(self.wake_command)}'
cmd = ['nsenter', '-t', str(container.details.container.pid), '-m', '-p', 'sh', '-c', init_command]
run_utility_container(self.args, f'ansible-test-init-{self.label}', cmd, options)
if init_probe:
check_container_cgroup_status(self.args, self.config, self.container_name, init_config.expected_mounts)
cmd = ['nsenter', '-t', str(container.details.container.pid), '-m', '-p'] + self.wake_command
run_utility_container(self.args, f'ansible-test-wake-{self.label}', cmd, options)
except SubprocessError:
display.info(f'Checking container "{self.container_name}" logs...')
docker_logs(self.args, self.container_name)
raise
def get_init_config(self) -> InitConfig:
"""Return init config for running under the current container engine."""
self.check_cgroup_requirements()
engine = require_docker().command
init_config = getattr(self, f'get_{engine}_init_config')()
return init_config
def get_podman_init_config(self) -> InitConfig:
"""Return init config for running under Podman."""
options = self.get_common_run_options()
command: t.Optional[str] = None
expected_mounts: tuple[CGroupMount, ...]
cgroup_version = get_docker_info(self.args).cgroup_version
# Without AUDIT_WRITE the following errors may appear in the system logs of a container after attempting to log in using SSH:
#
# fatal: linux_audit_write_entry failed: Operation not permitted
#
# This occurs when running containers as root when the container host provides audit support, but the user lacks the AUDIT_WRITE capability.
# The AUDIT_WRITE capability is provided by docker by default, but not podman.
# See: https://github.com/moby/moby/pull/7179
#
# OpenSSH Portable requires AUDIT_WRITE when logging in with a TTY if the Linux audit feature was compiled in.
# Containers with the feature enabled will require the AUDIT_WRITE capability when EPERM is returned while accessing the audit system.
# See: https://github.com/openssh/openssh-portable/blob/2dc328023f60212cd29504fc05d849133ae47355/audit-linux.c#L90
# See: https://github.com/openssh/openssh-portable/blob/715c892f0a5295b391ae92c26ef4d6a86ea96e8e/loginrec.c#L476-L478
#
# Some containers will be running a patched version of OpenSSH which blocks logins when EPERM is received while using the audit system.
# These containers will require the AUDIT_WRITE capability when EPERM is returned while accessing the audit system.
# See: https://src.fedoraproject.org/rpms/openssh/blob/f36/f/openssh-7.6p1-audit.patch
#
# Since only some containers carry the patch or enable the Linux audit feature in OpenSSH, this capability is enabled on a per-container basis.
# No warning is provided when adding this capability, since there's not really anything the user can do about it.
if self.config.audit == AuditMode.REQUIRED and detect_host_properties(self.args).audit_code == 'EPERM':
options.extend(('--cap-add', 'AUDIT_WRITE'))
# Without AUDIT_CONTROL the following errors may appear in the system logs of a container after attempting to log in using SSH:
#
# pam_loginuid(sshd:session): Error writing /proc/self/loginuid: Operation not permitted
# pam_loginuid(sshd:session): set_loginuid failed
#
# Containers configured to use the pam_loginuid module will encounter this error. If the module is required, logins will fail.
# Since most containers will have this configuration, the code to handle this issue is applied to all containers.
#
# This occurs when the loginuid is set on the container host and doesn't match the user on the container host which is running the container.
# Container hosts which do not use systemd are likely to leave the loginuid unset and thus be unaffected.
# The most common source of a mismatch is the use of sudo to run ansible-test, which changes the uid but cannot change the loginuid.
# This condition typically occurs only under podman, since the loginuid is inherited from the current user.
# See: https://github.com/containers/podman/issues/13012#issuecomment-1034049725
#
# This condition is detected by querying the loginuid of a container running on the container host.
# When it occurs, a warning is displayed and the AUDIT_CONTROL capability is added to containers to work around the issue.
# The warning serves as notice to the user that their usage of ansible-test is responsible for the additional capability requirement.
if (loginuid := detect_host_properties(self.args).loginuid) not in (0, LOGINUID_NOT_SET, None):
display.warning(f'Running containers with capability AUDIT_CONTROL since the container loginuid ({loginuid}) is incorrect. '
'This is most likely due to use of sudo to run ansible-test when loginuid is already set.', unique=True)
options.extend(('--cap-add', 'AUDIT_CONTROL'))
if self.config.cgroup == CGroupVersion.NONE:
# Containers which do not require cgroup do not use systemd.
options.extend((
# Disabling systemd support in Podman will allow these containers to work on hosts without systemd.
# Without this, running a container on a host without systemd results in errors such as (from crun):
# Error: crun: error stat'ing file `/sys/fs/cgroup/systemd`: No such file or directory:
# A similar error occurs when using runc:
# OCI runtime attempted to invoke a command that was not found
'--systemd', 'false',
# A private cgroup namespace limits what is visible in /proc/*/cgroup.
'--cgroupns', 'private',
# Mounting a tmpfs overrides the cgroup mount(s) that would otherwise be provided by Podman.
# This helps provide a consistent container environment across various container host configurations.
'--tmpfs', '/sys/fs/cgroup',
))
expected_mounts = (
CGroupMount(path=CGroupPath.ROOT, type=MountType.TMPFS, writable=True, state=None),
)
elif self.config.cgroup in (CGroupVersion.V1_V2, CGroupVersion.V1_ONLY) and cgroup_version == 1:
# Podman hosts providing cgroup v1 will automatically bind mount the systemd hierarchy read-write in the container.
# They will also create a dedicated cgroup v1 systemd hierarchy for the container.
# On hosts with systemd this path is: /sys/fs/cgroup/systemd/libpod_parent/libpod-{container_id}/
# On hosts without systemd this path is: /sys/fs/cgroup/systemd/{container_id}/
options.extend((
# Force Podman to enable systemd support since a command may be used later (to support pre-init diagnostics).
'--systemd', 'always',
# The host namespace must be used to permit the container to access the cgroup v1 systemd hierarchy created by Podman.
'--cgroupns', 'host',
# Mask the host cgroup tmpfs mount to avoid exposing the host cgroup v1 hierarchies (or cgroup v2 hybrid) to the container.
# Podman will provide a cgroup v1 systemd hiearchy on top of this.
'--tmpfs', '/sys/fs/cgroup',
))
self.check_systemd_cgroup_v1(options) # podman
expected_mounts = (
CGroupMount(path=CGroupPath.ROOT, type=MountType.TMPFS, writable=True, state=None),
# The mount point can be writable or not.
# The reason for the variation is not known.
CGroupMount(path=CGroupPath.SYSTEMD, type=MountType.CGROUP_V1, writable=None, state=CGroupState.HOST),
# The filesystem type can be tmpfs or devtmpfs.
# The reason for the variation is not known.
CGroupMount(path=CGroupPath.SYSTEMD_RELEASE_AGENT, type=None, writable=False, state=None),
)
elif self.config.cgroup in (CGroupVersion.V1_V2, CGroupVersion.V2_ONLY) and cgroup_version == 2:
# Podman hosts providing cgroup v2 will give each container a read-write cgroup mount.
options.extend((
# Force Podman to enable systemd support since a command may be used later (to support pre-init diagnostics).
'--systemd', 'always',
# A private cgroup namespace is used to avoid exposing the host cgroup to the container.
'--cgroupns', 'private',
))
expected_mounts = (
CGroupMount(path=CGroupPath.ROOT, type=MountType.CGROUP_V2, writable=True, state=CGroupState.PRIVATE),
)
elif self.config.cgroup == CGroupVersion.V1_ONLY and cgroup_version == 2:
# Containers which require cgroup v1 need explicit volume mounts on container hosts not providing that version.
# We must put the container PID 1 into the cgroup v1 systemd hierarchy we create.
cgroup_path = self.create_systemd_cgroup_v1() # podman
command = f'echo 1 > {cgroup_path}/cgroup.procs'
options.extend((
# Force Podman to enable systemd support since a command is being provided.
'--systemd', 'always',
# A private cgroup namespace is required. Using the host cgroup namespace results in errors such as the following (from crun):
# Error: OCI runtime error: mount `/sys/fs/cgroup` to '/sys/fs/cgroup': Invalid argument
# A similar error occurs when using runc:
# Error: OCI runtime error: runc create failed: unable to start container process: error during container init:
# error mounting "/sys/fs/cgroup" to rootfs at "/sys/fs/cgroup": mount /sys/fs/cgroup:/sys/fs/cgroup (via /proc/self/fd/7), flags: 0x1000:
# invalid argument
'--cgroupns', 'private',
# Unlike Docker, Podman ignores a /sys/fs/cgroup tmpfs mount, instead exposing a cgroup v2 mount.
# The exposed volume will be read-write, but the container will have its own private namespace.
# Provide a read-only cgroup v1 systemd hierarchy under which the dedicated ansible-test cgroup will be mounted read-write.
# Without this systemd will fail while attempting to mount the cgroup v1 systemd hierarchy.
# Podman doesn't support using a tmpfs for this. Attempting to do so results in an error (from crun):
# Error: OCI runtime error: read: Invalid argument
# A similar error occurs when using runc:
# Error: OCI runtime error: runc create failed: unable to start container process: error during container init:
# error mounting "tmpfs" to rootfs at "/sys/fs/cgroup/systemd": tmpcopyup: failed to copy /sys/fs/cgroup/systemd to /proc/self/fd/7
# (/tmp/runctop3876247619/runctmpdir1460907418): read /proc/self/fd/7/cgroup.kill: invalid argument
'--volume', '/sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd:ro',
# Provide the container access to the cgroup v1 systemd hierarchy created by ansible-test.
'--volume', f'{cgroup_path}:{cgroup_path}:rw',
))
expected_mounts = (
CGroupMount(path=CGroupPath.ROOT, type=MountType.CGROUP_V2, writable=True, state=CGroupState.PRIVATE),
CGroupMount(path=CGroupPath.SYSTEMD, type=MountType.CGROUP_V1, writable=False, state=CGroupState.SHADOWED),
CGroupMount(path=cgroup_path, type=MountType.CGROUP_V1, writable=True, state=CGroupState.HOST),
)
else:
raise InternalError(f'Unhandled cgroup configuration: {self.config.cgroup} on cgroup v{cgroup_version}.')
return self.InitConfig(
options=options,
command=command,
expected_mounts=expected_mounts,
)
def get_docker_init_config(self) -> InitConfig:
"""Return init config for running under Docker."""
options = self.get_common_run_options()
command: t.Optional[str] = None
expected_mounts: tuple[CGroupMount, ...]
cgroup_version = get_docker_info(self.args).cgroup_version
if self.config.cgroup == CGroupVersion.NONE:
# Containers which do not require cgroup do not use systemd.
if get_docker_info(self.args).cgroupns_option_supported:
# Use the `--cgroupns` option if it is supported.
# Older servers which do not support the option use the host group namespace.
# Older clients which do not support the option cause newer servers to use the host cgroup namespace (cgroup v1 only).
# See: https://github.com/moby/moby/blob/master/api/server/router/container/container_routes.go#L512-L517
# If the host cgroup namespace is used, cgroup information will be visible, but the cgroup mounts will be unavailable due to the tmpfs below.
options.extend((
# A private cgroup namespace limits what is visible in /proc/*/cgroup.
'--cgroupns', 'private',
))
options.extend((
# Mounting a tmpfs overrides the cgroup mount(s) that would otherwise be provided by Docker.
# This helps provide a consistent container environment across various container host configurations.
'--tmpfs', '/sys/fs/cgroup',
))
expected_mounts = (
CGroupMount(path=CGroupPath.ROOT, type=MountType.TMPFS, writable=True, state=None),
)
elif self.config.cgroup in (CGroupVersion.V1_V2, CGroupVersion.V1_ONLY) and cgroup_version == 1:
# Docker hosts providing cgroup v1 will automatically bind mount the systemd hierarchy read-only in the container.
# They will also create a dedicated cgroup v1 systemd hierarchy for the container.
# The cgroup v1 system hierarchy path is: /sys/fs/cgroup/systemd/{container_id}/
if get_docker_info(self.args).cgroupns_option_supported:
# Use the `--cgroupns` option if it is supported.
# Older servers which do not support the option use the host group namespace.
# Older clients which do not support the option cause newer servers to use the host cgroup namespace (cgroup v1 only).
# See: https://github.com/moby/moby/blob/master/api/server/router/container/container_routes.go#L512-L517
options.extend((
# The host cgroup namespace must be used.
# Otherwise, /proc/1/cgroup will report "/" for the cgroup path, which is incorrect.
# See: https://github.com/systemd/systemd/issues/19245#issuecomment-815954506
# It is set here to avoid relying on the current Docker configuration.
'--cgroupns', 'host',
))
options.extend((
# Mask the host cgroup tmpfs mount to avoid exposing the host cgroup v1 hierarchies (or cgroup v2 hybrid) to the container.
'--tmpfs', '/sys/fs/cgroup',
# A cgroup v1 systemd hierarchy needs to be mounted read-write over the read-only one provided by Docker.
# Alternatives were tested, but were unusable due to various issues:
# - Attempting to remount the existing mount point read-write will result in a "mount point is busy" error.
# - Adding the entire "/sys/fs/cgroup" mount will expose hierarchies other than systemd.
# If the host is a cgroup v2 hybrid host it would also expose the /sys/fs/cgroup/unified/ hierarchy read-write.
# On older systems, such as an Ubuntu 18.04 host, a dedicated v2 cgroup would not be used, exposing the host cgroups to the container.
'--volume', '/sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd:rw',
))
self.check_systemd_cgroup_v1(options) # docker
expected_mounts = (
CGroupMount(path=CGroupPath.ROOT, type=MountType.TMPFS, writable=True, state=None),
CGroupMount(path=CGroupPath.SYSTEMD, type=MountType.CGROUP_V1, writable=True, state=CGroupState.HOST),
)
elif self.config.cgroup in (CGroupVersion.V1_V2, CGroupVersion.V2_ONLY) and cgroup_version == 2:
# Docker hosts providing cgroup v2 will give each container a read-only cgroup mount.
# It must be remounted read-write before systemd starts.
command = 'mount -o remount,rw /sys/fs/cgroup/'
options.extend((
# A private cgroup namespace is used to avoid exposing the host cgroup to the container.
# This matches the behavior in Podman 1.7.0 and later, which select cgroupns 'host' mode for cgroup v1 and 'private' mode for cgroup v2.
# See: https://github.com/containers/podman/pull/4374
# See: https://github.com/containers/podman/blob/main/RELEASE_NOTES.md#170
'--cgroupns', 'private',
))
expected_mounts = (
CGroupMount(path=CGroupPath.ROOT, type=MountType.CGROUP_V2, writable=True, state=CGroupState.PRIVATE),
)
elif self.config.cgroup == CGroupVersion.V1_ONLY and cgroup_version == 2:
# Containers which require cgroup v1 need explicit volume mounts on container hosts not providing that version.
# We must put the container PID 1 into the cgroup v1 systemd hierarchy we create.
cgroup_path = self.create_systemd_cgroup_v1() # docker
command = f'echo 1 > {cgroup_path}/cgroup.procs'
options.extend((
# A private cgroup namespace is used since no access to the host cgroup namespace is required.
# This matches the configuration used for running cgroup v1 containers under Podman.
'--cgroupns', 'private',
# Provide a read-write tmpfs filesystem to support additional cgroup mount points.
# Without this Docker will provide a read-only cgroup2 mount instead.
'--tmpfs', '/sys/fs/cgroup',
# Provide a read-write tmpfs filesystem to simulate a systemd cgroup v1 hierarchy.
# Without this systemd will fail while attempting to mount the cgroup v1 systemd hierarchy.
'--tmpfs', '/sys/fs/cgroup/systemd',
# Provide the container access to the cgroup v1 systemd hierarchy created by ansible-test.
'--volume', f'{cgroup_path}:{cgroup_path}:rw',
))
expected_mounts = (
CGroupMount(path=CGroupPath.ROOT, type=MountType.TMPFS, writable=True, state=None),
CGroupMount(path=CGroupPath.SYSTEMD, type=MountType.TMPFS, writable=True, state=None),
CGroupMount(path=cgroup_path, type=MountType.CGROUP_V1, writable=True, state=CGroupState.HOST),
)
else:
raise InternalError(f'Unhandled cgroup configuration: {self.config.cgroup} on cgroup v{cgroup_version}.')
return self.InitConfig(
options=options,
command=command,
expected_mounts=expected_mounts,
)
def build_sleep_command(self) -> list[str]:
"""
Build and return the command to put the container to sleep.
The sleep duration below was selected to:
- Allow enough time to perform necessary operations in the container before waking it.
- Make the delay obvious if the wake command doesn't run or succeed.
- Avoid hanging indefinitely or for an unreasonably long time.
NOTE: The container must have a POSIX-compliant default shell "sh" with a non-builtin "sleep" command.
"""
docker_pull(self.args, self.config.image)
inspect = docker_image_inspect(self.args, self.config.image)
return ['sh', '-c', f'sleep 60; exec {shlex.join(inspect.cmd)}']
@property
def wake_command(self) -> list[str]:
"""
The command used to wake the container from sleep.
This will be run inside our utility container, so the command used does not need to be present in the container being woken up.
"""
return ['pkill', 'sleep']
def check_systemd_cgroup_v1(self, options: list[str]) -> None:
"""Check the cgroup v1 systemd hierarchy to verify it is writeable for our container."""
probe_script = (read_text_file(os.path.join(ANSIBLE_TEST_TARGET_ROOT, 'setup', 'check_systemd_cgroup_v1.sh'))
.replace('@MARKER@', self.MARKER)
.replace('@LABEL@', self.label))
cmd = ['sh']
try:
run_utility_container(self.args, f'ansible-test-cgroup-check-{self.label}', cmd, options, data=probe_script)
except SubprocessError as ex:
if error := self.extract_error(ex.stderr):
raise ControlGroupError(self.args, 'Unable to create a v1 cgroup within the systemd hierarchy.\n'
f'Reason: {error}') from ex # cgroup probe failed
raise
def create_systemd_cgroup_v1(self) -> str:
"""Create a unique ansible-test cgroup in the v1 systemd hierarchy and return its path."""
self.cgroup_path = f'/sys/fs/cgroup/systemd/ansible-test-{self.label}'
# Privileged mode is required to create the cgroup directories on some hosts, such as Fedora 36 and RHEL 9.0.
# The mkdir command will fail with "Permission denied" otherwise.
options = ['--volume', '/sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd:rw', '--privileged']
cmd = ['sh', '-c', f'>&2 echo {shlex.quote(self.MARKER)} && mkdir {shlex.quote(self.cgroup_path)}']
try:
run_utility_container(self.args, f'ansible-test-cgroup-create-{self.label}', cmd, options)
except SubprocessError as ex:
if error := self.extract_error(ex.stderr):
raise ControlGroupError(self.args, f'Unable to create a v1 cgroup within the systemd hierarchy.\n'
f'Reason: {error}') from ex # cgroup create permission denied
raise
return self.cgroup_path
@property
def delete_systemd_cgroup_v1_command(self) -> list[str]:
"""The command used to remove the previously created ansible-test cgroup in the v1 systemd hierarchy."""
return ['find', self.cgroup_path, '-type', 'd', '-delete']
def delete_systemd_cgroup_v1(self) -> None:
"""Delete a previously created ansible-test cgroup in the v1 systemd hierarchy."""
# Privileged mode is required to remove the cgroup directories on some hosts, such as Fedora 36 and RHEL 9.0.
# The BusyBox find utility will report "Permission denied" otherwise, although it still exits with a status code of 0.
options = ['--volume', '/sys/fs/cgroup/systemd:/sys/fs/cgroup/systemd:rw', '--privileged']
cmd = ['sh', '-c', f'>&2 echo {shlex.quote(self.MARKER)} && {shlex.join(self.delete_systemd_cgroup_v1_command)}']
try:
run_utility_container(self.args, f'ansible-test-cgroup-delete-{self.label}', cmd, options)
except SubprocessError as ex:
if error := self.extract_error(ex.stderr):
if error.endswith(': No such file or directory'):
return
display.error(str(ex))
def extract_error(self, value: str) -> t.Optional[str]:
"""
Extract the ansible-test portion of the error message from the given value and return it.
Returns None if no ansible-test marker was found.
"""
lines = value.strip().splitlines()
try:
idx = lines.index(self.MARKER)
except ValueError:
return None
lines = lines[idx + 1:]
message = '\n'.join(lines)
return message
def check_cgroup_requirements(self):
"""Check cgroup requirements for the container."""
cgroup_version = get_docker_info(self.args).cgroup_version
if cgroup_version not in (1, 2):
raise ApplicationError(f'The container host provides cgroup v{cgroup_version}, but only version v1 and v2 are supported.')
# Stop early for containers which require cgroup v2 when the container host does not provide it.
# None of the containers included with ansible-test currently use this configuration.
# Support for v2-only was added in preparation for the eventual removal of cgroup v1 support from systemd after EOY 2023.
# See: https://github.com/systemd/systemd/pull/24086
if self.config.cgroup == CGroupVersion.V2_ONLY and cgroup_version != 2:
raise ApplicationError(f'Container {self.config.name} requires cgroup v2 but the container host provides cgroup v{cgroup_version}.')
# Containers which use old versions of systemd (earlier than version 226) require cgroup v1 support.
# If the host is a cgroup v2 (unified) host, changes must be made to how the container is run.
#
# See: https://github.com/systemd/systemd/blob/main/NEWS
# Under the "CHANGES WITH 226" section:
# > systemd now optionally supports the new Linux kernel "unified" control group hierarchy.
#
# NOTE: The container host must have the cgroup v1 mount already present.
# If the container is run rootless, the user it runs under must have permissions to the mount.
#
# The following commands can be used to make the mount available:
#
# mkdir /sys/fs/cgroup/systemd
# mount cgroup -t cgroup /sys/fs/cgroup/systemd -o none,name=systemd,xattr
# chown -R {user}:{group} /sys/fs/cgroup/systemd # only when rootless
#
# See: https://github.com/containers/crun/blob/main/crun.1.md#runocisystemdforce_cgroup_v1path
if self.config.cgroup == CGroupVersion.V1_ONLY or (self.config.cgroup != CGroupVersion.NONE and get_docker_info(self.args).cgroup_version == 1):
if (cgroup_v1 := detect_host_properties(self.args).cgroup_v1) != SystemdControlGroupV1Status.VALID:
if self.config.cgroup == CGroupVersion.V1_ONLY:
if get_docker_info(self.args).cgroup_version == 2:
reason = f'Container {self.config.name} requires cgroup v1, but the container host only provides cgroup v2.'
else:
reason = f'Container {self.config.name} requires cgroup v1, but the container host does not appear to be running systemd.'
else:
reason = 'The container host provides cgroup v1, but does not appear to be running systemd.'
reason += f'\n{cgroup_v1.value}'
raise ControlGroupError(self.args, reason) # cgroup probe reported invalid state
def setup(self) -> None:
"""Perform out-of-band setup before delegation."""
bootstrapper = BootstrapDocker(
@ -370,32 +922,62 @@ class DockerProfile(ControllerHostProfile[DockerConfig], SshTargetHostProfile[Do
setup_sh = bootstrapper.get_script()
shell = setup_sh.splitlines()[0][2:]
docker_exec(self.args, self.container_name, [shell], data=setup_sh, capture=False)
try:
docker_exec(self.args, self.container_name, [shell], data=setup_sh, capture=False)
except SubprocessError:
display.info(f'Checking container "{self.container_name}" logs...')
docker_logs(self.args, self.container_name)
raise
def deprovision(self) -> None:
"""Deprovision the host after delegation has completed."""
if not self.container_name:
return # provision was never called or did not succeed, so there is no container to remove
if self.args.docker_terminate == TerminateMode.ALWAYS or (self.args.docker_terminate == TerminateMode.SUCCESS and self.args.success):
docker_rm(self.args, self.container_name)
container_exists = False
if self.container_name:
if self.args.docker_terminate == TerminateMode.ALWAYS or (self.args.docker_terminate == TerminateMode.SUCCESS and self.args.success):
docker_rm(self.args, self.container_name)
else:
container_exists = True
if self.cgroup_path:
if container_exists:
display.notice(f'Remember to run `{require_docker().command} rm -f {self.container_name}` when finished testing. '
f'Then run `{shlex.join(self.delete_systemd_cgroup_v1_command)}` on the container host.')
else:
self.delete_systemd_cgroup_v1()
elif container_exists:
display.notice(f'Remember to run `{require_docker().command} rm -f {self.container_name}` when finished testing.')
def wait(self) -> None:
"""Wait for the instance to be ready. Executed before delegation for the controller and after delegation for targets."""
if not self.controller:
con = self.get_controller_target_connections()[0]
last_error = ''
for dummy in range(1, 60):
for dummy in range(1, 10):
try:
con.run(['id'], capture=True)
except SubprocessError as ex:
if 'Permission denied' in ex.message:
raise
last_error = str(ex)
time.sleep(1)
else:
return
display.info('Checking SSH debug output...')
display.info(last_error)
if not self.args.delegate and not self.args.host_path:
def callback() -> None:
"""Callback to run during error display."""
self.on_target_failure() # when the controller is not delegated, report failures immediately
else:
callback = None
raise HostConnectionError(f'Timeout waiting for {self.config.name} container {self.container_name}.', callback)
def get_controller_target_connections(self) -> list[SshConnection]:
"""Return SSH connection(s) for accessing the host as a target from the controller."""
containers = get_container_database(self.args)
@ -423,12 +1005,33 @@ class DockerProfile(ControllerHostProfile[DockerConfig], SshTargetHostProfile[Do
"""Return the working directory for the host."""
return '/root'
def get_docker_run_options(self) -> list[str]:
def on_target_failure(self) -> None:
"""Executed during failure handling if this profile is a target."""
display.info(f'Checking container "{self.container_name}" logs...')
try:
docker_logs(self.args, self.container_name)
except SubprocessError as ex:
display.error(str(ex))
if self.config.cgroup != CGroupVersion.NONE:
# Containers with cgroup support are assumed to be running systemd.
display.info(f'Checking container "{self.container_name}" systemd logs...')
try:
docker_exec(self.args, self.container_name, ['journalctl'], capture=False)
except SubprocessError as ex:
display.error(str(ex))
display.error(f'Connection to container "{self.container_name}" failed. See logs and original error above.')
def get_common_run_options(self) -> list[str]:
"""Return a list of options needed to run the container."""
options = [
'--volume', '/sys/fs/cgroup:/sys/fs/cgroup:ro',
f'--privileged={str(self.config.privileged).lower()}',
# These temporary mount points need to be created at run time.
# These temporary mount points need to be created at run time when using Docker.
# They are automatically provided by Podman, but will be overridden by VOLUME instructions for the container, if they exist.
# If supporting containers with VOLUME instructions is not desired, these options could be limited to use with Docker.
# See: https://github.com/containers/podman/pull/1318
# Previously they were handled by the VOLUME instruction during container image creation.
# However, that approach creates anonymous volumes when running the container, which are then left behind after the container is deleted.
# These options eliminate the need for the VOLUME instruction, and override it if they are present.
@ -439,6 +1042,9 @@ class DockerProfile(ControllerHostProfile[DockerConfig], SshTargetHostProfile[Do
'--tmpfs', '/run/lock', # some systemd containers require a separate tmpfs here, such as Ubuntu 20.04 and Ubuntu 22.04
]
if self.config.privileged:
options.append('--privileged')
if self.config.memory:
options.extend([
f'--memory={self.config.memory}',
@ -509,7 +1115,7 @@ class NetworkRemoteProfile(RemoteProfile[NetworkRemoteConfig]):
else:
return
raise ApplicationError(f'Timeout waiting for {self.config.name} instance {core_ci.instance_id}.')
raise HostConnectionError(f'Timeout waiting for {self.config.name} instance {core_ci.instance_id}.')
def get_controller_target_connections(self) -> list[SshConnection]:
"""Return SSH connection(s) for accessing the host as a target from the controller."""
@ -599,12 +1205,12 @@ class PosixRemoteProfile(ControllerHostProfile[PosixRemoteConfig], RemoteProfile
try:
return self.get_working_directory()
except SubprocessError as ex:
if 'Permission denied' in ex.message:
raise
# No "Permission denied" check is performed here.
# Unlike containers, with remote instances, user configuration isn't guaranteed to have been completed before SSH connections are attempted.
display.warning(str(ex))
time.sleep(10)
raise ApplicationError(f'Timeout waiting for {self.config.name} instance {core_ci.instance_id}.')
raise HostConnectionError(f'Timeout waiting for {self.config.name} instance {core_ci.instance_id}.')
def get_controller_target_connections(self) -> list[SshConnection]:
"""Return SSH connection(s) for accessing the host as a target from the controller."""
@ -740,7 +1346,7 @@ class WindowsRemoteProfile(RemoteProfile[WindowsRemoteConfig]):
else:
return
raise ApplicationError(f'Timeout waiting for {self.config.name} instance {core_ci.instance_id}.')
raise HostConnectionError(f'Timeout waiting for {self.config.name} instance {core_ci.instance_id}.')
def get_controller_target_connections(self) -> list[SshConnection]:
"""Return SSH connection(s) for accessing the host as a target from the controller."""

@ -19,6 +19,7 @@ from .config import (
from .util import (
ApplicationError,
HostConnectionError,
display,
open_binary_file,
verify_sys_executable,
@ -185,13 +186,26 @@ def dispatch_jobs(jobs: list[tuple[HostProfile, WrappedThread]]) -> None:
time.sleep(1)
failed = False
connection_failures = 0
for profile, thread in jobs:
try:
thread.wait_for_result()
except HostConnectionError as ex:
display.error(f'Host {profile.config} connection failed:\n{ex}')
failed = True
connection_failures += 1
except ApplicationError as ex:
display.error(f'Host {profile.config} job failed:\n{ex}')
failed = True
except Exception as ex: # pylint: disable=broad-except
display.error(f'Host {profile} job failed: {ex}\n{"".join(traceback.format_tb(ex.__traceback__))}')
name = f'{"" if ex.__class__.__module__ == "builtins" else ex.__class__.__module__ + "."}{ex.__class__.__qualname__}'
display.error(f'Host {profile.config} job failed:\nTraceback (most recent call last):\n'
f'{"".join(traceback.format_tb(ex.__traceback__)).rstrip()}\n{name}: {ex}')
failed = True
if connection_failures:
raise HostConnectionError(f'Host job(s) failed, including {connection_failures} connection failure(s). See previous error(s) for details.')
if failed:
raise ApplicationError('Host job(s) failed. See previous error(s) for details.')

@ -703,6 +703,8 @@ class IntegrationTarget(CompletionTarget):
# configuration
self.retry_never = 'retry/never/' in self.aliases
self.setup_once = tuple(sorted(set(g.split('/')[2] for g in groups if g.startswith('setup/once/'))))
self.setup_always = tuple(sorted(set(g.split('/')[2] for g in groups if g.startswith('setup/always/'))))
self.needs_target = tuple(sorted(set(g.split('/')[2] for g in groups if g.startswith('needs/target/'))))

@ -2,6 +2,7 @@
from __future__ import annotations
import collections.abc as c
import contextlib
import functools
import sys
import threading
@ -60,3 +61,25 @@ def mutex(func: TCallable) -> TCallable:
return func(*args, **kwargs)
return wrapper # type: ignore[return-value] # requires https://www.python.org/dev/peps/pep-0612/ support
__named_lock = threading.Lock()
__named_locks: dict[str, threading.Lock] = {}
@contextlib.contextmanager
def named_lock(name: str) -> c.Iterator[bool]:
"""
Context manager that provides named locks using threading.Lock instances.
Once named lock instances are created they are not deleted.
Returns True if this is the first instance of the named lock, otherwise False.
"""
with __named_lock:
if lock_instance := __named_locks.get(name):
first = False
else:
first = True
lock_instance = __named_locks[name] = threading.Lock()
with lock_instance:
yield first

@ -946,6 +946,23 @@ class MissingEnvironmentVariable(ApplicationError):
self.name = name
class HostConnectionError(ApplicationError):
"""
Raised when the initial connection during host profile setup has failed and all retries have been exhausted.
Raised by provisioning code when one or more provisioning threads raise this exception.
Also raised when an SSH connection fails for the shell command.
"""
def __init__(self, message: str, callback: t.Callable[[], None] = None) -> None:
super().__init__(message)
self._callback = callback
def run_callback(self) -> None:
"""Run the error callback, if any."""
if self._callback:
self._callback()
def retry(func, ex_type=SubprocessError, sleep=10, attempts=10, warn=True):
"""Retry the specified function on failure."""
for dummy in range(1, attempts):

@ -9,6 +9,7 @@ disable=
import-outside-toplevel, # common pattern in ansible related code
raise-missing-from, # Python 2.x does not support raise from
too-few-public-methods,
too-many-public-methods,
too-many-arguments,
too-many-branches,
too-many-instance-attributes,

@ -427,6 +427,9 @@ bootstrap()
install_ssh_keys
customize_bashrc
# allow tests to detect ansible-test bootstrapped instances, as well as the bootstrap type
echo "${bootstrap_type}" > /etc/ansible-test.bootstrap
case "${bootstrap_type}" in
"docker") bootstrap_docker ;;
"remote") bootstrap_remote ;;

@ -0,0 +1,17 @@
# shellcheck shell=sh
set -eu
>&2 echo "@MARKER@"
cgroup_path="$(awk -F: '$2 ~ /^name=systemd$/ { print "/sys/fs/cgroup/systemd"$3 }' /proc/1/cgroup)"
if [ "${cgroup_path}" ] && [ -d "${cgroup_path}" ]; then
probe_path="${cgroup_path%/}/ansible-test-probe-@LABEL@"
mkdir "${probe_path}"
rmdir "${probe_path}"
exit 0
fi
>&2 echo "No systemd cgroup v1 hierarchy found"
exit 1

@ -0,0 +1,31 @@
"""A tool for probing cgroups to determine write access."""
from __future__ import (absolute_import, division, print_function)
__metaclass__ = type
import json
import os
import sys
def main(): # type: () -> None
"""Main program entry point."""
probe_dir = sys.argv[1]
paths = sys.argv[2:]
results = {}
for path in paths:
probe_path = os.path.join(path, probe_dir)
try:
os.mkdir(probe_path)
os.rmdir(probe_path)
except Exception as ex: # pylint: disable=broad-except
results[path] = str(ex)
else:
results[path] = None
print(json.dumps(results, sort_keys=True))
if __name__ == '__main__':
main()
Loading…
Cancel
Save