Pytest With `tmp_path` in Plain English

Explanation

pytest

Unit tests

tmp_path

tmp_path_factory

fixtures

pytest-in-plain-english

Plain English Discussion of Pytest Temporary Fixtures.

Author

Rich Leyshon

Published

April 25, 2024

Introduction

pytest is a testing package for the python framework. It is broadly used to quality assure code logic. This article discusses why and how we use pytest’s temporary fixtures tmp_path and tmp_path_factory. This blog is the second in a series of blogs called pytest in plain English, favouring accessible language and simple examples to explain the more intricate features of the pytest package.

For a wealth of documentation, guides and how-tos, please consult the pytest documentation.

A Note on the Purpose (Click to expand)

This article intends to discuss clearly. It doesn’t aim to be clever or impressive. Its aim is to extend understanding without overwhelming the reader.

Intended Audience

Programmers with a working knowledge of python and some familiarity with pytest and packaging. The type of programmer who has wondered about how to follow best practice in testing python code.

Preparation

This blog is accompanied by code in this repository. The main branch provides a template with the minimum structure and requirements expected to run a pytest suite. The repo branches contain the code used in the examples of the following sections.

Feel free to fork or clone the repo and checkout to the example branches as needed.

The example code that accompanies this article is available in the temp-fixtures branch of the repo.

What Are Temporary Fixtures?

In the previous pytest in plain English article, we discussed how to write our own fixtures to serve data to our tests. But pytest comes with its own set of fixtures that are really useful in certain situations. In this article, we will consider those fixtures that are used to create temporary directories and files.

Why Do We Need Temporary Fixtures?

If the code you need to test carries out file operations, then there are a few considerations needed when writing our tests. It is best practice in testing to ensure the system state is unaffected by running the test suite. In the very worst cases I have encountered, running the test suite has resulted in timestamped csvs being written to disk every time pytest was run. As developers potentially run these tests hundreds of times while working on a code base, this thoughtless little side-effect quickly results in a messy file system.

Just to clarify - I’m not saying it’s a bad idea to use timestamped file names. Or to have functions with these kinds of side effects - these features can be really useful. The problem is when the test suite creates junk on your disk that you weren’t aware of…

By using temporary fixtures, we are ensuring the tests are isolated from each other and behave in dependable ways. If you ever encounter a test suite that behaves differently on subsequent runs, then be suspicious of a messy test suite with file operations that have changed the state of the system. In order for us to reason about the state of the code, we need to be able to rely on the answers we get from the tests, known in test engineering speak as determinism.

Let’s Compare the Available Temporary Fixtures

The 2 fixtures that we should be working with as of 2024 are tmp_path and tmp_path_factory. Both of these newer temporary fixtures return pathlib.Path objects and are included with the pytest package in order to encourage developers to use them. No need to import tempfile or any other dependency to get what you need, it’s all bundled up with your pytest installation.

tmp_path is a function-scoped fixture. Meaning that if we use tmp_path in 2 unit tests, then we will be served with 2 separate temporary directories to work with. This should meet most developers’ needs. But if you’re doing something more complex with files, there are occasions where you may need a more persistent temporary directory. Perhaps a bunch of your functions need to work sequentially using files on disk and you need to test how all these units work together. This kind of scenario can arise if you are working on really large files where in-memory operations become too costly. This is where tmp_path_factory can be useful, as it is a session-scoped temporary structure. A tmp_path_factory structure will be created at the start of a test suite and will persist until teardown happens once the last test has been executed.

Fixture Name	Scope	Teardown after each
`tmp_path`	function	test function
`tmp_path_factory`	session	`pytest` session

What About `tmpdir`?

Ah, the eagle-eyed among you may have noticed that the pytest package contains other fixtures that are relevant to temporary structures. Namely tmpdir and tmpdir_factory. These fixtures are older equivalents of the fixtures we discussed above. The main difference is that instead of returning pathlib.Path objects, they return py.path.local objects. These fixtures were written before pathlib had been adopted as the standardised approach to handling paths across multiple operating systems. The future of tmpdir and tmpdir_factory have been discussed for deprecation. These fixtures are being sunsetted and it is advised to port old test suites over to the new tmp_path fixture instead. The pytest team has provided a utility to help developers identify these issues in their old test suites.

In summary, don’t use tmpdir any more and consider converting old code if you used it in the past…

How to Use Temporary Fixtures

Writing Source Code

As a reminder, the code for this section is located here.

In this deliberately silly example, let’s say we have a poem sitting on our disk in a text file. Thanks to chatGPT for the poem and MSFT Bing Copilot for the image, making this a trivial consideration. Or should I really thank the millions of people who wrote the content that these services trained on?

Saving the text file in the chunk below to the ./tests/data/ folder is where you would typically save data for your tests.

A modern take on Jack and Jill sees the pair fending off bugs in a future technological dystopia.

tests/data/jack-jill-2024.txt

In the realm of data, where Jack and Jill dwell,
They ventured forth, their tale to tell.
But amidst the bytes, a glitch they found,
A challenge profound, in algorithms bound.

Their circuits whirred, their processors spun,
As they analyzed the glitch, one by one.
Yet despite their prowess, misfortune struck,
A bug so elusive, like lightning struck.

Their systems faltered, errors abound,
As frustration grew with each rebound.
But Jack and Jill, with minds so keen,
Refused to let the glitch remain unseen.

With perseverance strong and logic clear,
They traced the bug to its hidden sphere.
And with precision fine and code refined,
They patched the glitch, their brilliance defined.

In the end, though misfortune came their way,
Jack and Jill triumphed, without delay.
For in the realm of AI, where challenges frown,
Their intellect prevailed, wearing victory's crown.

So let their tale inspire, in bytes and code,
Where challenges rise on the digital road.
For Jack and Jill, with their AI might,
Showed that even in darkness, there's always light.

Let’s imagine we need a program that can edit the text and write new versions of the poem to disk. Let’s go ahead and create a function that will read the poem from disk and replace any word that you’d like to change.

"""Demonstrating tmp_path & tmp_path_factory with a simple txt file."""
from pathlib import Path
from typing import Union

def _update_a_term(
    txt_pth: Union[Path, str], target_pattern:str, replacement:str) -> str:
    """Replace the target pattern in a body of text.

    Parameters
    ----------
    txt_pth : Union[Path, str]
        Path to a txt file.
    target_pattern : str
        The pattern to replace.
    replacement : str
        The replacement value.

    Returns
    -------
    str
        String with any occurrences of target_pattern replaced with specified
        replacement value.

    """
    with open(txt_pth, "r") as f:
        txt = f.read()
        f.close()
    return txt.replace(target_pattern, replacement)

Now we can try using the function to rename a character in the rhyme, by running the below code in a python shell.

from pyprojroot import here
rhyme = _update_a_term(
  txt_pth=here("data/blogs/jack-jill-2024.txt"),
  target_pattern="Jill",
  replacement="Jock")
print(rhyme[0:175])

In the realm of data, where Jack and Jock dwell,
They ventured forth, their tale to tell.
But amidst the bytes, a glitch they found,
A challenge profound, in algorithms bound.

Why Use Underscores?

You may have noticed that the above function starts with an underscore. This convention means the function is not intended for use by the user. These internal functions would typically have less defensive checks than those you intend to expose to your users. It’s not an enforced thing but is considered good practice. It means “use at your own risk” as internals often have less documentation, may not be directly tested and could be less stable than functions in the api.

Great, next we need a little utility function that will take our text and write it to a file of our choosing.

def _write_string_to_txt(some_txt:str, out_pth:Union[Path, str]) -> None:
    """Write some string to a text file.

    Parameters
    ----------
    some_txt : str
        The text to write to file.
    out_pth : Union[Path, str]
        The path to the file.
    
    Returns
    -------
    None

    """
    with open(out_pth, "w") as f:
        f.writelines(some_txt)
        f.close()

Finally, we need a wrapper function that will use the above functions, allowing the user to read in the text file, replace a pattern and then write the new poem to file.

def update_poem(
    poem_pth:Union[Path, str],
    target_pattern:str,
    replacement:str,
    out_file:Union[Path, str]) -> None:
    """Takes a txt file, replaces a pattern and writes to a new file.

    Parameters
    ----------
    poem_pth : Union[Path, str]
        Path to a txt file.
    target_pattern : str
        A pattern to update.
    replacement : str
        The replacement value.
    out_file : Union[Path, str]
        A file path to write to.

    """
    txt = _update_a_term(poem_pth, target_pattern, replacement)
    _write_string_to_txt(txt, out_file)

How do we know it works? We can use it and observe the output, as I did with _update_a_term() earlier, but this article is about testing. So let’s get to it.

Testing the Source Code

We need to test update_poem() but it writes files to disk. We don’t want to litter our (and our colleagues’) disks with files every time pytest runs. Therefore we need to ensure the function’s out_file parameter is pointing at a temporary directory. In that way, we can rely on the temporary structure’s behaviour on teardown to remove these files when pytest finishes doing its business.

"""Tests for update_poetry module."""
import os

import pytest

from example_pkg import update_poetry

def test_update_poem_writes_new_pattern_to_file(tmp_path):
    """Check that update_poem changes the poem pattern and writes to file."""
    new_poem_path = os.path.join(tmp_path, "new_poem.txt")
    update_poetry.update_poem(
        poem_pth="tests/data/jack-jill-2024.txt",
        target_pattern="glitch",
        replacement="bug",
        out_file=new_poem_path
        )

Before I go ahead and add a bunch of assertions in, look at how easy it is to use tmp_path, blink and you’ll miss it. You simply reference it in the signature of the test where you wish to use it and then you are able to work with it like you would any other path object.

So far in this test function, I specified that I’d like to read the text from a file called jack-jill-2024.txt, replace the word “glitch” with “bug” wherever it occurs and then write this text to a file called new_poem.txt in a temporary directory.

Some simple tests for this little function:

Does the file I asked for exist?
Are the contents of that file as I expect?

Let’s go ahead and add in those assertions.

"""Tests for update_poetry module."""

import os

import pytest

from example_pkg import update_poetry

def test_update_poem_writes_new_pattern_to_file(tmp_path):
    """Check that update_poem changes the poem pattern and writes to file."""
    new_poem_path = os.path.join(tmp_path, "new_poem.txt")
    update_poetry.update_poem(
        poem_pth="tests/data/jack-jill-2024.txt",
        target_pattern="glitch",
        replacement="bug",
        out_file=new_poem_path
        )
    # Now for the assertions
    assert os.path.exists(new_poem_path)
    assert os.listdir(tmp_path) == ["new_poem.txt"]
    # let's check what pattern was written - now we need to read in the
    # contents of the new file.
    with open(new_poem_path, "r") as f:
        what_was_written = f.read()
        f.close()
    assert "glitch" not in what_was_written
    assert "bug" in what_was_written

Running pytest results in the below output.

collected 1 item

tests/test_update_poetry.py .                                            [100%]

============================== 1 passed in 0.01s ==============================

So we prove that the function works how we hoped it would. But what if I want to work with the new_poem.txt file again in another test function? Let’s add another test to test_update_poetry.py and see what we get when we try to use tmp_path once more.

"""Tests for update_poetry module."""
# import statements ...

# def test_update_poem_writes_new_pattern_to_file(tmp_path): ...

def test_do_i_get_a_new_tmp_path(tmp_path):
    """Remind ourselves that tmp_path is function-scoped."""
    assert "new_poem" not in os.listdir(tmp_path)
    assert os.listdir(tmp_path) == []

As is demonstrated when running pytest once more, tmp_path is function-scoped and we have now lost the new poem with the bugs instead of the glitches. Drat! What to do…

collected 2 items

tests/test_update_poetry.py ..                                           [100%]

============================== 2 passed in 0.01s ==============================

As mentioned earlier, pytest provides another fixture with more flexibility, called tmp_path_factory. As this fixture is session-scoped, we can have full control over this fixture’s scoping.

Fixture Scopes

For a refresher on the rules of scope referencing, please see the blog Pytest Fixtures in Plain English.

"""Tests for update_poetry module."""
# import statements ...

# def test_update_poem_writes_new_pattern_to_file(tmp_path): ...

# def test_do_i_get_a_new_tmp_path(tmp_path): ...

@pytest.fixture(scope="module")
def _module_scoped_tmp(tmp_path_factory):
    yield tmp_path_factory.mktemp("put_poetry_here", numbered=False)

Note that as tmp_path_factory is session-scoped, I’m free to reference it in another fixture with any scope. Here I define a module-scoped fixture, which means teardown of _module_scoped_tmp will occur once the final test in this test module completes. Now repeating the logic executed with tmp_path above, but this time with our new module-scoped temporary directory, we get a different outcome.

"""Tests for update_poetry module."""
# import statements ...

# def test_update_poem_writes_new_pattern_to_file(tmp_path): ...

# def test_do_i_get_a_new_tmp_path(tmp_path): ...

@pytest.fixture(scope="module")
def _module_scoped_tmp(tmp_path_factory):
    yield tmp_path_factory.mktemp("put_poetry_here", numbered=False)


def test_module_scoped_tmp_exists(_module_scoped_tmp):
    new_poem_path = os.path.join(_module_scoped_tmp, "new_poem.txt")
    update_poetry.update_poem(
        poem_pth="tests/data/jack-jill-2024.txt",
        target_pattern="glitch",
        replacement="bug",
        out_file=new_poem_path
        )
    assert os.path.exists(new_poem_path)
    with open(new_poem_path, "r") as f:
        what_was_written = f.read()
        f.close()
    assert "glitch" not in what_was_written
    assert "bug" in what_was_written
    assert os.listdir(_module_scoped_tmp) == ["new_poem.txt"]


def test_do_i_get_a_new_tmp_path_factory(_module_scoped_tmp):
    assert not os.listdir(_module_scoped_tmp) == [] # not empty...
    assert os.listdir(_module_scoped_tmp) == ["new_poem.txt"]
    # module-scoped fixture still contains file made in previous test function
    with open(os.path.join(_module_scoped_tmp, "new_poem.txt")) as f:
        found_txt = f.read()
        f.close()
    assert "glitch" not in found_txt
    assert "bug" in found_txt

Executing pytest one final time demonstrates that the same output file written to disk with test_module_scoped_tmp_exists() is subsequently available for further testing in test_do_i_get_a_new_tmp_path_factory().

collected 4 items

tests/test_update_poetry.py ....                                         [100%]

============================== 4 passed in 0.01s ==============================

Note that the order that these 2 tests run in is now important. These tests are no longer isolated and trying to run the second test on its own with pytest -k "test_do_i_get_a_new_tmp_path_factory" would result in a failure. For this reason, it may be advisable to pop the test functions within a common test class, or even use pytest marks to mark them as integration tests (more on this in a future blog).

Summary

The reasons we use temporary fixtures and how to use them has been demonstrated with another silly (but hopefully relatable) little example. I have not gone into the wealth of methods available in these temporary fixtures, but they have many useful utilities. Maybe you’re working with a complex nested directory structure for example, the glob method would surely help with that.

Below are the public methods and attributes of tmp_path:

['absolute', 'anchor', 'as_posix', 'as_uri', 'chmod', 'cwd', 'drive', 'exists',
'expanduser', 'glob', 'group', 'hardlink_to', 'home', 'is_absolute',
'is_block_device', 'is_char_device', 'is_dir', 'is_fifo', 'is_file',
'is_junction', 'is_mount', 'is_relative_to', 'is_reserved', 'is_socket',
'is_symlink', 'iterdir', 'joinpath', 'lchmod', 'lstat', 'match', 'mkdir',
'name', 'open', 'owner', 'parent', 'parents', 'parts', 'read_bytes',
'read_text', 'readlink', 'relative_to', 'rename', 'replace', 'resolve',
'rglob', 'rmdir', 'root', 'samefile', 'stat', 'stem', 'suffix', 'suffixes',
'symlink_to', 'touch', 'unlink', 'walk', 'with_name', 'with_segments',
'with_stem', 'with_suffix', 'write_bytes', 'write_text']

It is useful to read the pathlib.Path docs as both fixtures return this type and many of the methods above are inherited from these types. To read the tmp_path and tmp_path_factory implementation, I recommend reading the tmp docstrings on GitHub.

If you spot an error with this article, or have suggested improvement then feel free to raise an issue on GitHub.

Happy testing!

Acknowledgements

To past and present colleagues who have helped to discuss pros and cons, establishing practice and firming-up some opinions. Particularly:

Charlie
Dan
Edward
Ian
Mark

fin!

Introduction

Intended Audience

What You’ll Need:

Preparation

What Are Temporary Fixtures?

Why Do We Need Temporary Fixtures?

Let’s Compare the Available Temporary Fixtures

What About tmpdir?

How to Use Temporary Fixtures

Writing Source Code

Testing the Source Code

Summary

Acknowledgements

What About `tmpdir`?