By Murilo Cunha

Starting a new project and not sure where to begin? What should the directory structure look like? What are these “best practices”? It may sound a bit silly I've found myself spending waaaay too much time on these "small" decisions. If that's you, it's probably because you don't have a strong opinion about how to structure your Python project. So we've added a list of tools and practices in a project template, so you don't have to.

TLDR;

We’ve put down a small list of things to include in your next project.

  1. Protect your “production” branch [i.e.: main]
  2. Setup environment/dependency manager [i.e.: poetry]
  3. Setup code validation [i.e.: black, isort, flake8]
  4. Setup a testing framework [i.e.: pytest, pytest-cov]
  5. Setup CI/CD
  6. Setup documentation [i.e.: README.md]

Get up and running quickly by running:

git clone git@github.com:datarootsio/python-minimal-boilerplate.git

1. Protect the “production” branch

The “production” branch here is the principal/default branch in your repo - usually main (in Github) or master in Gitlab/…

The main reason for that is to enable peer reviews when requesting changes.

This also puts us in a good spot to do some CICD on the main branch afterward - new commits to main are deployed.

2. Setup environment/dependency manager

We want to make sure the next person to contribute to the project can quickly get set up - that includes which Python packages and versions we are using. Minimum information includes:

  • Python version(s)
  • List of packages - name+version(s)

Now, we're glossing over a lot of details here. This can be a heated topic and the Python ecosystem moves quickly. We could also split the discussion between environments+dependency management and packaging. "Packaging" here refers to how to distribute the application.

Python packaging is a more complex conversation - is your project a building block (i.e.: a library) or an end product (i.e.: an application)? Depending on your answer you may have different concerns. If you're interested, take a look at PyPA's overview on Packaging for Python.

Luckily, some tools bundle packaging, dependency, and environment management.

Tooling:

  • Poetry - a popular tool that provides a way to manage dependencies, and environments and package the project

Other popular choices

  • Conda - Anaconda’s environment manager - partially manages lower-level dependencies
  • Pipenv - similar to poetry, but does not include the packaging part (just manage environments)
  • pip + requirements.txt + venv - Python's standard tooling
  • ... and many more

3. Setup code validators / linters

A code linter checks your code beyond the “Does it work?” question. Things like “unused variables”, “unused imports”, etc. don't stop the interpreter from running the code, but these are usually code smells.

More than that, Python's community converged to a set of conventions (see PEP8) that are generally accepted and followed - these things include the maximum number of characters in a line or naming constants with upper case, for example. If you can’t remember what PEP8 is all about, just sing along:

Hard to keep track of all the conventions? People have built tools to help you make (or even enforce) your code PEP8-compliant.

Tooling:

Must haves

  • black - opinionated and uncompromising code formatter (i.e.: it'll change your code to make it compliant), the generally accepted Python standard
  • isort - changes your code to sort your imports
  • flake8 - linter that checks only your code (will raise errors in case of violations), that is extendable based on plugins

Nice to haves

  • flake8-docstrings - Flake8 plugin that enforces docstrings on functions and scripts (following docstring conventions)
  • flake8-annotations - Flake8 plugin that enforces the use of type annotations on functions (does not do anything with them aside from making sure they are there)
  • mypy - a static type checker - looks at your types and raises errors if there is any code that would break during runtime based on these types (i.e.: str + int, etc.)

Alternatives

  • ruff - a flake8+flake8 plugins+isort bundle equivalent written in Rust (so it's super fast!)
  • pyre/pyright - static type checker alternatives
  • ... and more

4. Setup a testing framework

What does it mean to "test" code? It essentially means writing code that executes other code and checks that the outputs are what you expected. We encapsulate these tests into functions.

A nice pattern is to write one test per function (as best as possible) and organize your tests in a similar structure in which your functions appear. It makes it easy to find tests, which can also serve as "documentation" - essentially example(s) of what your functions do and the expected outputs.

Tooling:

  • pytest - generally accepted standard Python projects, with extensive capabilities and  extendable system with plugins
  • unittest - testing package available in the standard library

Test coverage

Adding tests takes time. And that can be especially true if you have complex functions with many edge cases. How can we know when we had "enough" tests? One way is to keep track of how much of our code is “covered” by our tests. “Covered” means “Is this line of code run on a test?”. Functions with an if-else statements, for example, would need at least 2 tests to “run” both conditional blocks.

We can also enforce coverage on a project level - i.e.: the tests must cover >90% of our codebase, etc. This can also be enforced in CI/CD.

Tooling:

General guidelines for testing with pytest

  • Install pytest - add to your (dev) dependencies
  • Create a tests directory
  • Follow the same directory/file structure in your Python project - see the example below
  • More info at the docs
.
├── pyproject.toml
├── my_project
│   └── ...
│   └── foo.py			# contains a `bar` function
└── tests
    └── ...
    └── test_foo.py		# contains a `test_bar` test function

5. Setup CI/CD

After implementing tests and defining your code standards and linting tools, we are left with frequently checking that the code meets the target standards. Automating these checks is an even nicer idea. That's where the CI/CD comes in handy.

CI/CD (continuous integration & continuous deployment) is often achieved via wrapping the code execution logic (e.g.: all tests and linting checks pass in CI) around git triggers. A "git trigger" can be when a pull request is opened (typically for CI), or when it gets merged into your main/master branch (typically for CD). We can also use this system to deploy code - e.g.: "when there are new changes on the main/master branch, take the code, build a docker container, and serve it to our customers".

This event-triggered tooling aims to maximize the automation of test suites and the deployment process following successful tests. It depends a bit on the CI/CD system, but configuring CI/CD pipelines is generally done by using yaml files.

Tooling

  • GitHub Actions
  • GitlabCI
  • Jenkins
  • CircleCI

CI/CD templates

Some things keep on coming back. It's not a bad idea to create reusable CI/CD templates.

GitHub Actions

Enforce that code must pass all testing and linting checks, using multiple Python versions

name: 'tests'
on: [push, pull_request]

jobs:
  tests:
    runs-on: ubuntu-latest
    strategy:  # drop this if you only want to test for a specific version
      matrix:
        python-version: ["3.7", "3.8", "3.9", "3.10", "3.11"]
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python ${{ matrix.python-version }} # or your python specific version
        uses: actions/setup-python@v4
        with:
          python-version: ${{ matrix.python-version }} # or your python specific version
      - name: Install dependencies
        run: |
          pip install poetry==1.1.7
          poetry config virtualenvs.create false
          poetry install --no-interaction --no-ansi
      - name: QA with black and isort
        run: |
          black . --check
          isort . --check-only
      - name: Tests
        run: |
          pytest -vv

6. Enrich documentation

README

Don’t forget to add a nice README.md! This is the place for new people to get onboarded but also instructing people on how to contribute to the project. Things you can include:

  • A project structure overview (hint: tree . can be useful here)
  • What goes where - where to find and add functionality
  • Who maintains/to contact for any questions

Docstrings

Some documentation lives in your code via docstrings to your functions. Docstrings are part of the function/class definition, so it also helps with IDE auto-completes and pop-ups. You can enforce docstrings with flake8-docstrings plugin as well.

Putting it together

Sounds like a lot? Don't worry! We've included all the mentioned above in a GitHub template. There you can see how the pieces fit together and kickstart your next project! 🚀

Check it out on github.com/datarootsio/python-minimal-boilerplate/, feel free to suggest changes via pull requests or open an issue!

You might also like

Setting up AWS Infrastructure Using Terraform for Beginners - Jinfu Chen, Baudouin Martelée
After one month of training at dataroots, some starters work on the internalproject. The project of the Rootsacademy 2022 Q3 class consists of making anend-to-end solution for inferring information from traffic images. It goeswithout saying that this end-to-end solution requires infrastructure. I…
What is architecture? - Wim Van Leuven
As a growing data consultancy boutique, we get more and more questions to reviewand architect data platforms. While growing, we are also maturing thearchitecture practice at Dataroots. What is Architecture?We can obviously not discuss architecture without some reflection on the termitself in th…