=========================== Porting Modules to Python 3 =========================== Ansible modules are not the usual Python-3 porting exercise. There are two factors that make it harder to port them than most code: 1. Many modules need to run on Python-2.4 in addition to Python-3. 2. A lot of mocking has to go into unittesting a Python-3 module. So it's harder to test that your porting has fixed everything or to make sure that later commits haven't regressed. Which version of Python-3.x and which version of Python-2.x are our minimums? ============================================================================= The short answer is Python-3.5 and Python-2.4 but please read on for more information. For Python-3 we are currently using Python-3.5 as a minimum on both the controller and the managed nodes. This was chosen as it's the version of Python3 in Ubuntu-16.04, the first long-term support (LTS) distribution to ship with Python3 and not Python2. Much of our code would still work with Python-3.4 but there are always bugfixes and new features in any new upstream release. Taking advantage of this relatively new version allows us not to worry about workarounds for problems and missing features in that older version. For Python-2, the default is for the controller to run on Python-2.6 and modules to run on Python-2.4. This allows users with older distributions that are stuck on Python-2.4 to manage their machines. Modules are allowed to drop support for Python-2.4 when one of their dependent libraries require a higher version of python. This is not an invitation to add unnecessary dependent libraries in order to force your module to be usable only with a newer version of Python. Instead it is an acknowledgment that some libraries (for instance, boto3 and docker-py) will only function with newer Python. .. note:: When will we drop support for Python-2.4? The only long term supported distro that we know of with Python-2.4 is RHEL5 (and its rebuilds like CentOS5) which is supported until April of 2017. Whatever major release we make in or after April of 2017 (probably 2.4.0) will no longer have support for Python-2.4 on the managed machines. Previous major release series's that we support (2.3.x) will continue to support Python-2.4 on the managed nodes. We know of no long term supported distributions with Python-2.5 so the new minimum Python-2 version will be Python-2.6. This will let us take advantage of the forwards-compat features of Python-2.6 so porting and maintainance of Python-2/Python-3 code will be easier after that. Supporting only Python-2 or only Python-3 ========================================= Sometimes a module's dependent libraries only run on Python-2 or only run on Python-3. We do not yet have a strategy for these modules but we'll need to come up with one. I see three possibilities: 1. We treat these libraries like any other libraries that may not be installed on the system. When we import them we check if the import was successful. If so, then we continue. If not we return an error about the library being missing. Users will have to find out that the library is unavailable on their version of Python either by searching for the library on their own or reading the requirements section in :command:`ansible-doc`. 2. The shebang line is the only metadata that Ansible extracts from a module so we may end up using that to specify what we mean. Something like ``#!/usr/bin/python`` means the module will run on both Python-2 and Python-3, ``#!/usr/bin/python2`` means the module will only run on Python-2, and ``#!/usr/bin/python3`` means the module will only run on Python-3. Ansible's code will need to be modified to accommodate this. For :command:`python2`, if ``ansible_python2_interpreter`` is not set, it will have to fallback to `` ansible_python_interpreter`` and if that's not set, fallback to ``/usr/bin/python``. For :command:`python3`, Ansible will have to first try ``ansible_python3_interpreter`` and then fallback to ``/usr/bin/python3`` as normal. 3. We add a way for Ansible to retrieve metadata about modules. The metadata will include the version of Python that is required. Methods 2 and 3 will both require that we modify modules or otherwise add this additional information somewhere. 2 needs only a little code changes in executor/module_common.py to parse. 3 will require a lot of work. This is probably not worthwhile if this is the only change but could be worthwhile if there's other things as well. 1 requires that we port all modules to work with python3 syntax but only the code path to get to the library import being attempted and then a fail_json() being called because the libraries are unavailable needs to actually work. .. note:: Metadata proposal in progress A metadata specification is being created to address module maintainership. In the future we will likely extend this to record that a module works with Python2 and 3, Python2 only, or Python3 only. Tips, tricks, and idioms to adopt ================================= Exceptions ---------- In code which already needs Python-2.6+ (For instance, because a library it depends on only runs on Python >= 2.6) it is okay to port directly to the new exception catching syntax:: try: a = 2/0 except ValueError as e: module.fail_json(msg="Tried to divide by zero!") For modules which also run on Python-2.4, we have to use an uglier construction to make this work under both Python-2.4 and Python-3:: from ansible.module_utils.pycompat24 import get_exception [...] try: a = 2/0 except ValueError: e = get_exception() module.fail_json(msg="Tried to divide by zero!") Octal numbers ------------- In Python-2.4, octal literals are specified as ``0755``. In Python-3, that is invalid and octals must be specified as ``0o755``. To bridge this gap, modules should create their octals like this:: # Can't use 0755 on Python-3 and can't use 0o755 on Python-2.4 EXECUTABLE_PERMS = int('0755', 8) Outputting octal numbers may also need to be changed. In python2 we often did this to return file permissions:: mode = int('0775', 8) result['mode'] = oct(mode) This would give the user ``result['mode'] == '0755'`` in their playbook. In python3, :func:`oct` returns the format with the lowercase ``o`` in it like: ``result['mode'] == '0o755'``. If a user had a conditional in their playbook or was using the mode in a template the new format might break things. We need to return the old form of mode for backwards compatibility. You can do it like this:: mode = int('0775', 8) result['mode'] = '0%03o' % mode You should use this wherever backwards compatibility is a concern or you are dealing with file permissions. (With file permissions a user may be feeding the mode into another program or to another module which doesn't understand the python syntax for octal numbers. ``[zero][digit][digit][digit]`` is understood by most everything and therefore the right way to express octals in these circumstances. Bundled six ----------- The third-party python-six library exists to help projects create code that runs on both Python-2 and Python-3. Ansible includes version 1.4.1 in module_utils so that other modules can use it without requiring that it is installed on the remote system. To make use of it, import it like this:: from ansible.module_utils import six .. note:: Why version 1.4.1? six-1.4.1 is the last version of python-six to support Python-2.4. As long as Ansible modules need to run on Python-2.4 we won't be able to update the bundled copy of six. Compile Test ------------ We have Shippable compiling all modules with various versions of Python to check that the modules conform to the syntax at those versions. When you've ported a module so that its syntax works with Python-3, we need to remove it from the blacklist so that the module is included in the syntax check. The file `test/utils/shippable/sanity-skip-python3.txt` contains the list of modules which should not be tested (because we know that they are older modules which have not yet been ported to pass the Python-3 syntax checks. To get another old module to compile with Python-3, remove the entry for it from the list. The goal is to have the LIST be empty. String Model ------------ One of the big differences between Python2 and Python3 is the string model. In Python2, most APIs take byte strings (the Python2 ``str`` type). Using the text type (in Python2, this is the ``unicode`` type) often leads to tracebacks because the strings need to be converted to bytes and Python fails to do that correctly. In Python3, the situation is somewhat reversed. Most APIs take text strings (this is **Python3's** ``str`` type). When you have byte strings (the Python3 ``bytes`` type) you sometimes get errors when attempting to combine those with text strings. Note, however, that under the hood, Python still has to convert text to bytes to interface operating system libraries and system calls. This means that you can still get tracebacks when passing text to APIs which call those OS level facilities. For module_utils, code we've decided to make the environment work with "native strings". This means that on Python2, things should work if you use the byte string type. In Python3, code should work if you give it text strings. The reason for this is so that third party modules written for Python2 don't start issuing UnicodeError exceptions once we've ported module_utils to work under Python3. We'll need to gather experience to see if this is going to work out well for modules as well or if we should give the module_utils API explicit switches so that modules can choose to operate with text type all of the time. Helpers ~~~~~~~ For converting between bytes, text, and native strings we have three helper functions. These are :func:`ansible.module_utils._text.to_bytes`, :func:`ansible.module_utils._text.to_native`, and :func:`ansible.module_utils._text.to_text`. These are similar to using ``bytes.decode()`` and ``unicode.encode()`` with a few differences. * By default they try very hard not to traceback. * The default encoding is "utf-8" * There are two error strategies that don't correspond one-to-one with a python codec error handler. These are ``surrogate_or_strict`` and ``surrogate_or_replace``. ``surrogate_or_strict`` will use the ``surrogateescape`` error handler if available (mostly on python3) or strict if not. It is most appropriate to use when dealing with something that needs to round trip its value like file paths database keys, etc. Without ``surrogateescape`` the best thing these values can do is generate a traceback that our code can catch and decide how to show an error message. ``surrogate_or_replace`` is for when a value is going to be displayed to the user. If the ``surrogateescape`` error handler is not present, it will replace undecodable byte sequences with a replacement character. ================================ Porting Core Ansible to Python 3 ================================ The Ansible code which runs controller-side is easier to port to Python3 in one important way: We do not have to support Python-2.4 on the controller. We only have to support Python-2.6 and above. However, this doesn't eliminate the work that has to be done. The controller is a much more complicated piece of code than any individual module. Making it Python2 and Python3 compatible is a much more complex task. String Model ------------ By and large, the controller uses the standard best practice of storing everything internally as text type and converting to and from bytes at the borders. In many places we hardcode these byte values as utf-8. Thus yaml and inventory files are encoded in utf-8. Filenames are also utf-8. This may not be the right answer forever but it is sufficient for now. If there's demand from users to handle encodings other than utf-8 after the code works on Python3 we can look into what strategy to take for supporting other encodings. In some cases, storing values as a byte string is not necessarily a choice without drawbacks. For instance, filenames and environment variables on POSIX systems are a sequence of bytes. By using text to represent filenames we prevent filenames that are undecodable in utf-8 and filenames that are not text at all from working. We made the choice to represent these as text for now due to code paths that handle filenames not being able to handle bytes end-to-end. PyYAML on Python3 and jinja2 on both Python2 and Python3, for instance, are meant to work with text. Any decision to allow filenames to be byte values will have to address how we deal with those pieces of the code as well.