Custom Marks With Pytest in Plain English

Explanation

pytest

Unit tests

marks

custom marks

markers

pytest-in-plain-english

Selectively running tests with marks

Author

Rich Leyshon

Published

July 22, 2024

“The only flake I want is that of a croissant.” Mariel Feldman.

Introduction

pytest is a testing package for the python framework. It is broadly used to quality assure code logic. This article discusses custom marks, use cases and patterns for selecting and deselecting marked tests. This blog is the fifth and final in a series of blogs called pytest in plain English, favouring accessible language and simple examples to explain the more intricate features of the pytest package.

For a wealth of documentation, guides and how-tos, please consult the pytest documentation.

What are `pytest` Custom Marks?

Marks are a way to conditionally run specific tests. There are a few marks that come with the pytest package. To view these, run pytest --markers from the command line. This will print a list of the pre-registered marks available within the pytest package. However, it is extremely easy to register your own markers, allowing greater control over which tests get executed.

This article will cover:

Reasons for marking tests
Registering marks with pytest
Marking tests
Including or excluding markers from the command line

A Note on the Purpose (Click to expand)

This article intends to discuss clearly. It doesn’t aim to be clever or impressive. Its aim is to extend understanding without overwhelming the reader. The code may not always be optimal, favouring a simplistic approach wherever possible.

Intended Audience

Programmers with a working knowledge of python and some familiarity with pytest and packaging. The type of programmer who has wondered about how to follow best practice in testing python code.

Preparation

This blog is accompanied by code in this repository. The main branch provides a template with the minimum structure and requirements expected to run a pytest suite. The repo branches contain the code used in the examples of the following sections.

Feel free to fork or clone the repo and checkout to the example branches as needed.

The example code that accompanies this article is available in the marks branch of the repo.

Overview

Occasionally, we write tests that are a bit distinct to the rest of our test suite. They could be integration tests, calling on elements of our code from multiple modules. They could be end to end tests, executing a pipeline from start to finish. Or they could be a flaky or brittle sort of test, a test that is prone to failure on specific operating systems, architectures or when external dependencies do not provide reliable inputs.

There are multiple ways to handle these kinds of tests, including mocking, as discussed in my previous blog. Mocking can often take a bit of time, and developers don’t always have that precious commodity. So instead, they may mark the test, ensuring that it doesn’t get run on continuous integration (CI) checks. This may involve flagging any flaky test as “technical debt” to be investigated and fixed later.

In fact, there are a number of reasons that we may want to selectively run elements of a test suite. Here is a selection of scenarios that could benefit from marking.

Flaky Tests: Common Causes

Category	Cause	Explanation
External Dependencies	Network	Network latency, outages, or Domain Name System (DNS) issues.
	Web APIs	Breaking changes, rate limits, or outages.
	Databases	Concurrency issues, data changes, or connection problems.
	Timeouts	Hardcoded or too-short timeouts cause failures.
Environment Dependencies	Environment Variables	Incorrectly set environment variables.
	File System	File locks, permissions, or missing files.
	Resource Limits	Insufficient CPU, memory, or disk space.
State Dependencies	Shared State	Interference between tests sharing state.
	Order Dependency	Tests relying on execution order.
Test Data	Random Data	Different results on each run due to random data and seed not set.
Concurrency Issues	Parallel Execution	Tests not designed for parallel execution.
	Locks	Deadlocks or timeouts involving locks or semaphores.
	Race Conditions	Tests depend on the order of execution of threads or processes.
	Async Operations	Improperly awaited asynchronous code.
Hardware and System Issues	Differences in Hardware	Variations in performance across hardware or operating systems.
	System Load	Failures under high system load due to resource contention.
Non-deterministic Inputs	Time	Variations in current time affecting test results.
	User Input	Non-deterministic user input causing flaky behaviour.
	Filepaths	CI runner filepaths may be hard to predict.
Test Implementation Issues	Assertions	Incorrect or overly strict assertions.
	Setup and Teardown	Inconsistent state due to improper setup or teardown.

In the case of integration tests, one approach may be to group them all together and have them execute within a dedicated CI workflow. This is common practice as developers may want to stay alert to problems with external resources that their code depends upon, while not failing ‘core’ CI checks about changes to the source code. If your code relies on a web API; for instance; you’re probably less concerned about temporary outages in that service. However, a breaking change to that service would require our source code to be adapted. Once more, life is a compromise.

“Le mieux est l’ennemi du bien.” (The best is the enemy of the good), Voltaire

Custom Marks in `pytest`

Marking allows us to have greater control over which of our tests are executed when we invoke pytest. Marking is conveniently implemented in the following way (presuming you have already written your source and test code):

Register a custom marker
Assign the new marker name to the target test
Invoke pytest with the -m (MARKEXPR) flag.

This section uses code available in the marks branch of the GitHub repository.

Define the Source Code

I have a motley crew of functions to consider. A sort of homage to Sergio Leone’s ‘The Good, The Bad & the Ugly’, although I’ll let you figure out which is which.

The Flaky Function

Here we define a function that will fail half the time. What a terrible test to have. The root of this unpredictable behaviour should be diagnosed as a priority as a matter of sanity.

import random


def croissant():
    """A very flaky function."""
    if round(random.uniform(0, 1)) == 1:
        return True
    else:
        raise Exception("Flaky test detected!")

The Slow Function

This function is going to be pretty slow. Slow test suites throttle our productivity. Once it finishes waiting for a specified number of seconds, it will return a string.

import time
from typing import Union


def take_a_nap(how_many_seconds:Union[int, float]) -> str:
    """Mimic a costly function by just doing nothing for a specified time."""
    time.sleep(float(how_many_seconds))
    return "Rise and shine!"

The Needy Function

Finally, the needy function will have an external dependency on a website. This test will simply check whether we get a HTTP status code of 200 (ok) when we request any URL.

import requests


def check_site_available(url:str, timeout:int=5) -> bool:
    """Checks if a site is available."""
    try:
        response = requests.get(url, timeout=timeout)
        return True if response.status_code == 200 else False
    except requests.RequestException:
        return False

The Wrapper

Finally, I’ll introduce a wrapper that will act as an opportunity for an integration test. This is a bit awkward, as none of the above functions are particularly related to each other.

This function will execute the check_site_available() and take_a_nap() together. A pretty goofy example, I admit. Based on the status of the url request, a string will be returned.

import time
from typing import Union

import requests

def goofy_wrapper(url:str, timeout:int=5) -> str:
    """Check a site is available, pause for no good reason before summarising
    outcome with a string."""
    msg = f"Napping for {timeout} seconds.\n"
    msg = msg + take_a_nap(timeout)
    if check_site_available(url):
        msg = msg + "\nYour site is up!"
    else:
        msg = msg + "\nYour site is down!"

    return msg

Let’s Get Testing

Initially, I will define a test that does nothing other than pass. This will be a placeholder, unmarked test.

def test_nothing():
    pass

Next, I import croissant() and assert that it returns True. As you may recall from above, croissant() will do so ~50 % of the time.

from example_pkg.do_something import (
    croissant,
    )


def test_nothing():
    pass


def test_croissant():
    assert croissant()

Now running pytest -v will print the test results, reporting test outcomes for each test separately (-v means verbose).

...% pytest -v
============================= test session starts =============================
platform darwin -- Python 3.12.3, pytest-8.1.1, pluggy-1.5.0 -- /...
cachedir: .pytest_cache
rootdir: /...
configfile: pyproject.toml
testpaths: ./tests
collected 2 items                                                             
tests/test_do_something.py::test_nothing PASSED                          [ 50%]
tests/test_do_something.py::test_croissant PASSED                        [100%]
============================== 2 passed in 0.05s ==============================

But note that half the time, I will also get the following output:

...% pytest -v
============================= test session starts =============================
platform darwin -- Python 3.12.3, pytest-8.1.1, pluggy-1.5.0 -- /...
cachedir: .pytest_cache
rootdir: /...
configfile: pyproject.toml
testpaths: ./tests
collected 2 items                         

tests/test_do_something.py::test_nothing PASSED                          [ 50%]
tests/test_do_something.py::test_croissant FAILED                        [100%]

================================== FAILURES ==================================
_______________________________ test_croissant ________________________________
    @pytest.mark.flaky
    def test_croissant():
>       assert croissant()
tests/test_do_something.py:17: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    def croissant():
        """A very flaky function."""
        if round(random.uniform(0, 1)) == 1:
            return True
        else:
>           raise Exception("Flaky test detected!")
E           Exception: Flaky test detected!

src/example_pkg/do_something.py:13: Exception
============================short test summary info ===========================
FAILED ...::test_croissant - Exception: Flaky test detected!
========================= 1 failed, 1 passed in 0.07s =========================

To prevent this flaky test from failing the test suite, we can choose to mark it as flaky, and optionally skip it when invoking pytest. To go about that, we first need to register a new marker. To do that, let’s update out project’s pyproject.toml to include additional options for a flaky mark:

# `pytest` configurations
[tool.pytest.ini_options]
markers = [
    "flaky: tests that can randomly fail through no change to the code",
]

Note that when registering a marker in this way, text after the colon is an optional mark description. Saving the document and running pytest --markers should show that a new custom marker is available to our project:

... % pytest --markers
@pytest.mark.flaky: tests that can randomly fail through no change to the code
...

Now that we have confirmed our marker is available for use, we can use it to mark test_croissant() as flaky:

import pytest

from example_pkg.do_something import (
    croissant,
    )


def test_nothing():
    pass


@pytest.mark.flaky
def test_croissant():
    assert croissant()

Note that we need to import pytest to our test module in order to use the pytest.mark.<MARK_NAME> decorator.

Selecting a Single Mark

Now that we have registered and marked a test as flaky, we can adapt our pytest call to execute tests with that mark only. The pattern we will use is:

pytest -v -m "<INSERT_MARK_NAME>"

... % pytest -v -m "flaky"
============================= test session starts =============================
platform darwin -- Python 3.12.3, pytest-8.1.1, pluggy-1.5.0 -- /...
cachedir: .pytest_cache
rootdir: /...
configfile: pyproject.toml
testpaths: ./tests
collected 2 items / 1 deselected / 1 selected                                 

tests/test_do_something.py::test_croissant PASSED                        [100%]

======================= 1 passed, 1 deselected in 0.05s =======================

Now we see that test_croissant() was executed, while the unmarked test_nothing() was not.

Deselecting a Single Mark

More useful than selectively running a flaky test is to deselect it. In this way, it cannot fail our test suite. This is achieved with the following pattern:

pytest -v -m "not <INSERT_MARK_NAME>"

... % pytest -v -m "not flaky"
============================= test session starts =============================
platform darwin -- Python 3.12.3, pytest-8.1.1, pluggy-1.5.0 -- /...
cachedir: .pytest_cache
rootdir: /...
configfile: pyproject.toml
testpaths: ./tests
collected 2 items / 1 deselected / 1 selected                                  

tests/test_do_something.py::test_nothing PASSED                          [100%]

======================= 1 passed, 1 deselected in 0.05s =======================

Note that this time, test_flaky() was not executed.

Selecting Multiple Marks

In this section, we will introduce another, differently marked test to illustrate the syntax for running multiple marks. For this example, we’ll test take_a_nap():

import pytest

from example_pkg.do_something import (
    croissant,
    take_a_nap,
    )


def test_nothing():
    pass


@pytest.mark.flaky
def test_croissant():
    assert croissant()


@pytest.mark.slow
def test_take_a_nap():
    out = take_a_nap(how_many_seconds=3)
    assert isinstance(out, str), f"a string was not returned: {type(out)}"
    assert out == "Rise and shine!", f"unexpected string pattern: {out}"

Our new test just makes some simple assertions about the string take_a_nap() returns after snoozing. But notice what happens when running pytest -v now:

... % pytest -v
============================= test session starts =============================
platform darwin -- Python 3.12.3, pytest-8.1.1, pluggy-1.5.0 -- /...
cachedir: .pytest_cache
rootdir: /...
configfile: pyproject.toml
testpaths: ./tests
collected 3 items                                                             

tests/test_do_something.py::test_nothing PASSED                          [ 33%]
tests/test_do_something.py::test_croissant PASSED                        [ 66%]
tests/test_do_something.py::test_take_a_nap PASSED                       [100%]

============================== 3 passed in 3.07s ==============================

The test suite now takes in excess of 3 seconds to execute, as the test specified for take_a_nap() to sleep for that period. Let’s update our pyproject.toml and register a new mark:

# `pytest` configurations
[tool.pytest.ini_options]
markers = [
    "flaky: tests that can randomly fail through no change to the code",
    "slow: marks tests as slow (deselect with '-m \"not slow\"')",
]

Note that the nested speech marks within the description of the slow mark were escaped. pytest would have complained that the toml file was not valid unless we ensured it was valid toml syntax.

In order to run tests marked with either flaky or slow, we can use or:

pytest -v -m "<INSERT_MARK_1> or <INSERT_MARK_2>"

... % pytest -v -m "flaky or slow"
============================= test session starts =============================
platform darwin -- Python 3.12.3, pytest-8.1.1, pluggy-1.5.0 -- /...
cachedir: .pytest_cache
rootdir: /...
configfile: pyproject.toml
testpaths: ./tests
collected 3 items / 1 deselected / 2 selected                                 

tests/test_do_something.py::test_croissant PASSED                        [ 50%]
tests/test_do_something.py::test_take_a_nap PASSED                       [100%]

======================= 2 passed, 1 deselected in 3.06s =======================

Note that anything not marked with flaky or slow (eg test_nothing()) was not run. Also, test_croissant() failed 3 times in a row while I tried to get a passing run. I didn’t want the flaky exception to carry on presenting itself. While I may be sprinkling glitter, I do not want to misrepresent how frustrating flaky tests can be!

Glitter GIFfrom Glitter GIFs

Complex Selection Rules

By adding an additional mark, we can illustrate more complex selection and deselection rules for invoking pytest. Let’s write an integration test that checks whether the domain for this blog site can be reached.

import pytest

from example_pkg.do_something import (
    croissant,
    take_a_nap,
    check_site_available,
    )


def test_nothing():
    pass


@pytest.mark.flaky
def test_croissant():
    assert croissant()


@pytest.mark.slow
def test_take_a_nap():
    out = take_a_nap(how_many_seconds=3)
    assert isinstance(out, str), f"a string was not returned: {type(out)}"
    assert out == "Rise and shine!", f"unexpected string pattern: {out}"


@pytest.mark.integration
def test_check_site_available():
    url = "https://thedatasavvycorner.com/"
    assert check_site_available(url), f"site {url} is down..."

Now updating our pyproject.toml like so:

# `pytest` configurations
[tool.pytest.ini_options]
markers = [
    "flaky: tests that can randomly fail through no change to the code",
    "slow: marks tests as slow (deselect with '-m \"not slow\"')",
    "integration: tests that require external resources",
]

Now we can combine and and not statements when calling pytest to execute just the tests we need to. In the below, I choose to run the slow and integration tests while excluding that pesky flaky test.

... % pytest -v -m "slow or integration and not flaky"
============================= test session starts =============================
platform darwin -- Python 3.12.3, pytest-8.1.1, pluggy-1.5.0 -- /...
cachedir: .pytest_cache
rootdir: /...
configfile: pyproject.toml
testpaths: ./tests
collected 4 items / 2 deselected / 2 selected                                 

tests/test_do_something.py::test_take_a_nap PASSED                       [ 50%]
tests/test_do_something.py::test_check_site_available PASSED             [100%]

======================= 2 passed, 2 deselected in 3.29s =======================

Note that both test_nothing() (unmarked) and test_croissant() (deselected) were not run.

Marks and Test Classes

Note that so far, we have applied marks to test functions only. But we can also apply marks to an entire test class, or even target specific test modules. For this section, I will introduce the wrapper function introduced earlier and use a test class to group its tests together. I will mark those tests with 2 new marks, classy and subclassy.

# `pytest` configurations
[tool.pytest.ini_options]
markers = [
    "flaky: tests that can randomly fail through no change to the code",
    "slow: marks tests as slow (deselect with '-m \"not slow\"')",
    "integration: tests that require external resources",
    "classy: tests arranged in a class",
    "subclassy: test methods",
]

Updating our test module to include test_goofy_wrapper():

import pytest

from example_pkg.do_something import (
    croissant,
    take_a_nap,
    check_site_available,
    goofy_wrapper
    )


def test_nothing():
    pass


@pytest.mark.flaky
def test_croissant():
    assert croissant()


@pytest.mark.slow
def test_take_a_nap():
    out = take_a_nap(how_many_seconds=3)
    assert isinstance(out, str), f"a string was not returned: {type(out)}"
    assert out == "Rise and shine!", f"unexpected string pattern: {out}"


@pytest.mark.integration
def test_check_site_available():
    url = "https://thedatasavvycorner.com/"
    assert check_site_available(url), f"site {url} is down..."


@pytest.mark.classy
class TestGoofyWrapper:
    @pytest.mark.subclassy
    def test_goofy_wrapper_url_exists(self):
        assert goofy_wrapper(
            "https://thedatasavvycorner.com/", 1
            ).endswith("Your site is up!"), "The site wasn't up."
    @pytest.mark.subclassy
    def test_goofy_wrapper_url_does_not_exist(self):
        assert goofy_wrapper(
            "https://thegoofycorner.com/", 1
            ).endswith("Your site is down!"), "The site wasn't down."

Note that targeting either the classy or subclassy mark results in the same output - all tests within this test class are executed:

... % pytest -v -m "classy"
============================= test session starts =============================
platform darwin -- Python 3.12.3, pytest-8.1.1, pluggy-1.5.0 -- /...
cachedir: .pytest_cache
rootdir: /...
configfile: pyproject.toml
testpaths: ./tests
collected 6 items / 4 deselected / 2 selected                                 

TestGoofyWrapper::test_goofy_wrapper_url_exists PASSED                   [ 50%]
TestGoofyWrapper::test_goofy_wrapper_url_does_not_exist PASSED           [100%]

======================= 2 passed, 4 deselected in 2.30s =======================

Nobody created the domain https://thegoofycorner.com/ yet, such a shame.

Tests with Multiple Marks

Note that we can use multiple marks with any test or test class. Let’s update TestGoofyWrapper to be marked as integration & slow:

@pytest.mark.slow
@pytest.mark.integration
@pytest.mark.classy
class TestGoofyWrapper:
    @pytest.mark.subclassy
    def test_goofy_wrapper_url_exists(self):
        assert goofy_wrapper(
            "https://thedatasavvycorner.com/", 1
            ).endswith("Your site is up!"),"The site wasn't up."
    @pytest.mark.subclassy
    def test_goofy_wrapper_url_does_not_exist(self):
        assert goofy_wrapper(
            "https://thegoofycorner.com/", 1
            ).endswith("Your site is down!"), "The site wasn't down."

This test class can now be exclusively targeted by specifying multiple marks with and:

pytest -v -m "<INSERT_MARK_1> and <INSERT_MARK2>... and <INSERT_MARK_N>"


... % pytest -v -m "integration and slow"   
============================= test session starts =============================
platform darwin -- Python 3.12.3, pytest-8.1.1, pluggy-1.5.0 -- /...
cachedir: .pytest_cache
rootdir: /...
configfile: pyproject.toml
testpaths: ./tests
collected 6 items / 4 deselected / 2 selected                                 

TestGoofyWrapper::test_goofy_wrapper_url_exists PASSED                   [ 50%]
TestGoofyWrapper::test_goofy_wrapper_url_does_not_exist PASSED           [100%]

======================= 2 passed, 4 deselected in 2.30s =======================

Note that even though there are other tests marked with integration and slow separately, they are excluded on the basis that and expects them to be marked with both.

Deselecting All Marks

Now that we have introduced multiple custom markers to our test suite, what if we want to exclude all of these marked tests, just running the ‘core’ test suite? Unfortunately, there is not a way to specify ‘unmarked’ tests. There is an old pytest plugin called pytest-unmarked that allowed this functionality. Unfortunately, this plugin is not being actively maintained and is not compatible with pytest v8.0.0+. You could introduce a ‘standard’ or ‘core’ marker, but you’d need to remember to mark every unmarked test within your test suite with it.

Alternatively, what we can do is exclude each of the marks that have been registered. There are 2 patterns for achieving this:

pytest -v -m "not <INSERT_MARK_1> ... or not <INSERT_MARK_N>"

pytest -v -m "not (<INSERT_MARK_1> ... or <INSERT_MARK_N>)"

... % pytest -v -m "not (flaky or slow or integration)"
============================= test session starts =============================
platform darwin -- Python 3.12.3, pytest-8.1.1, pluggy-1.5.0 -- ...
cachedir: .pytest_cache
rootdir: /...
configfile: pyproject.toml
testpaths: ./tests
collected 6 items / 5 deselected / 1 selected                                 

tests/test_do_something.py::test_nothing PASSED                          [100%]

======================= 1 passed, 5 deselected in 0.05s =======================

Note that using or has greedily excluded any test marked with at least one of the specified marks.

Summary

Registering marks with pytest is very easy and is useful for controlling which tests are executed. We have illustrated:

registering marks
marking tests and test classes
the use of the pytest -m flag
selection of multiple marks
deselection of multiple marks

Overall, this feature of pytest is simple and intuitive. There are more options for marking tests. I recommend reading the pytest custom markers examples for more information.

As mentioned earlier, this is the final in the pytest in plain English series. I will be taking a break from blogging about testing for a while. But colleagues have asked about articles on property-based testing and some of the more useful pytest plug-ins. I plan to cover these topics at a later date.

If you spot an error with this article, or have a suggested improvement then feel free to raise an issue on GitHub.

Happy testing!

Acknowledgements

To past and present colleagues who have helped to discuss pros and cons, establishing practice and firming-up some opinions. Particularly:

Ethan
Sergio

fin!