[ad_1]
Whether you’re a seasoned developer or just getting started with 🐍 Python, it’s important to know how to build robust and maintainable projects. This tutorial will guide you through the process of setting up a Python project using some of the most popular and effective tools in the industry. You will learn how to use GitHub and GitHub Actions for version control and continuous integration, as well as other tools for testing, documentation, packaging and distribution. The tutorial is inspired by resources such as Hypermodern Python and Best Practices for a new Python project. However, this is not the only way to do things and you might have different preferences or opinions. The tutorial is intended to be beginner-friendly but also cover some advanced topics. In each section, you will automate some tasks and add badges to your project to show your progress and achievements.
The repository for this series can be found at github.com/johschmidt42/python-project-johannes
This part was inspired by this blog post:
- OS: Linux, Unix, macOS, Windows (WSL2 with e.g. Ubuntu 20.04 LTS)
- Tools: python3.10, bash, git, tree
- Version Control System (VCS) Host: GitHub
- Continuous Integration (CI) Tool: GitHub Actions
It is expected that you are familiar with the versioning control system (VCS) git. If not, here’s a refresher for you: Introduction to Git
Commits will be based on best practices for git commits & Conventional commits. There is the conventional commit plugin for PyCharm or a VSCode Extension that help you to write commits in this format.
Overview
Structure
- Git Branching Strategy (GitHub flow)
- What is a release? (zip, tar.gz)
- Semantic Versioning (v0.1.0)
- Create a release manually (git tag, GitHub)
- Create a release automatically (conventional commits, semantic releases)
- CI/CD (release.yml)
- Create a Personal Access Token (PAT)
- GitHub Actions Flow (Orchestrating workflows)
- Badge (Release)
- Bonus (Enforce conventional commits)
Releasing software is an important step in the software development process as it makes new features and bugfixes available to users. One key aspect of releasing software is versioning, which helps to track and communicate the changes made in each release. Semantic versioning is a widely used standard for versioning software, which uses a version number in the format of Major.Minor.Patch (e.g. 1.2.3) to indicate the level of changes made in a release.
Conventional commits is a specification for adding human and machine readable meaning to commit messages. It’s a way to format commit messages in a consistent manner, which make it easy to determine the type of change made. Conventional commits are commonly used in conjunction with semantic versioning, as the commit messages can be used to automatically determine the version number of a release. Together, semantic versioning and conventional commits provide a clear and consistent way to track and communicate the changes made in each release of a software project.
There are many different branching strategies out there for git. Many people gravitate towards GitFlow (or variants), Three Flow, or Trunk based Flows. Some do strategies in between these, such as this one. I’m using the very simple GitHub flow branching strategy, where all bug fixes and features have their own separate branch, and when complete, each branch is merged to main and deployed. Simple, nice and easy.
Whatever your strategy might be, in the end you merge a pull request and (probably) create a release.
In short, a release is packing up code of a version (e.g. zip) and pushing it to production (whatever this might be for you).
Release management can be messy. Therefore there needs to be a concise way that you follow (and others), that defines what a release means and what changes between one release and the next. If you don’t track the changes between the releases, then you probably won’t understand what has been changed in each release and you can’t identify any problems that might have been introduced with new code. Without a changelog, it can be difficult to understand how the software has evolved over time. It can also make it difficult to roll back changes if necessary.
Semantic Versioning is just a number schema and standard practice in the industry for software development. It indicates the level of changes between this version and the previous one. There are three parts to a semantic version number, such as 1.8.42, that follow the pattern of :
Each one of them means a different degree of change. A PATCH release indicates bug fixes or trivial changes (e.g. from 1.0.0 to 1.0.1). A MINOR release indicates adding/removing functionality or backwards compatible changes of functionality (e.g. from 1.0.0 to 1.1.0). A MAJOR release indicates adding/removing functionality and potentially backwards in-compatible changes such as breaking changes (e.g. from 1.0.0 to 2.0.0).
I recommend a talk of Mike Miles, if you want a visual introduction into releases with semantic versioning. It’s a summary of what releases are and how semantic versioning with git tags allows us to create releases.
About git tags: There are lightweight and annotated tags in git. A lightweight tag is just a pointer to a specific commit whereas an annotated tag is a full object in git.
Let’s create a release manually first and then automate it.
If you remember, our example_app’s __init__.py
file contains the version
# src/example_app/__init__.py__version__ = "0.1.0"
as well as the pyproject.toml
file
# pyproject.toml[tool.poetry]
name = "example_app"
version = "0.1.0"
...
So the first thing we must do is to create an annotated git tag v0.1.0
and add it to the latest commit in main:
> git tag -a v0.1.0 -m "version v0.1.0"
Please note that if no commit hash is specified at the end of the command, then git will use the current commit you are on.
We can get a list of tags with:
> git tagv0.1.0
and if we want delete it again:
> git tag -d v0.1.0Deleted tag 'v0.1.0'
and get more information about the tag with:
> git show v0.1.0tag v0.1.0
Tagger: Johannes Schmidt <[email protected]>
Date: Sat Jan 7 12:55:15 2023 +0100
version v0.1.0
commit efc9a445cd42ce2f7ddfbe75ffaed1a5bc8e0f11 (HEAD -> main, tag: v0.1.0, origin/main, origin/HEAD)
Author: Johannes Schmidt <[email protected]>
Date: Mon Jan 2 11:20:25 2023 +0100
...
We can push the newly created tag to origin with
> git push origin v0.1.0Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Writing objects: 100% (1/1), 171 bytes | 171.00 KiB/s, done.
Total 1 (delta 0), reused 0 (delta 0), pack-reused 0
To github.com:johschmidt42/python-project-johannes.git
* [new tag] v0.1.0 -> v0.1.0
so that this git tag is now available on GitHub:
Let’s manually create a new release in GitHub with this git tag:
We click on Create a new release
, select our existing tag (that is already bound to a commit) and then generate release notes automatically by clicking on the Generate release notes
button before we finally publish the release with the Publish release
button.
GitHub will automatically create a tar
and a zip
(assets) for the source code, but will not build the application! The result will look like this:
To summarise, the steps for a release are:
- create a new branch from your default branch (e.g. feature or fix branch)
- make changes and increase the version (e.g. pyproject.toml and __init__.py)
- commit the feature/bug fix to the default branch (probably through a Pull Request)
- add an annotated git tag (semantic version) to the commit
- publish the release on GitHub with some additional information
As programmers, we don’t like to repeat ourselves. So there are plenty of tools that make these steps super easy for us. Here, I will introduce Semantic Releases, a tool specifically for Python Projects.
It’s a tool which automatically sets a version number in your repo, tags the code with the version number and creates a release! And this is all done using the contents of Conventional Commit style messages.
Conventional Commits
What is the connection between semantic versioning and conventional-commits?
Certain commit types can be used to automatically determine a semantic version bump!
- A
fix
commit is a PATCH. - A
feat
commit is a MINOR. - A commit with
BREAKING CHANGE
or!
is a MAJOR.
Other types, e.g. build
, chore
, ci
, docs
, style
, refactor
, perf
, test
generally don’t increase the version.
Check out the bonus section at the end to find out how to enforce conventional commits in your project!
Automatic semantic releases (locally)
We can add the library with:
> poetry add --group semver python-semantic-release
Let’s go through the configuration settings that allow us to automatically generate change-logs and releases. In the pyproject.toml
, we can add semantic_release as a tool:
# pyproject.toml...
[tool.semantic_release]
branch = "main"
version_variable = "src/example_app/__init__.py:__version__"
version_toml = "pyproject.toml:tool.poetry.version"
version_source = "tag"
commit_version_number = true # required for version_source = "tag"
tag_commit = true
upload_to_pypi = false
upload_to_release = false
hvcs = "github" # gitlab is also supported
branch
: specifies the branch that the release should be based on, in this case the “main” branch.version_variable
: specifies the file path and variable name of the version number in the source code. In this case, the version number is stored in the__version__
variable in the filesrc/example_app/__init__.py
.version_toml
: specifies the file path and variable name of the version number in thepyproject.toml
file. In this case, the version number is stored in thetool.poetry.version
variable of thepyproject.toml
fileversion_source
: Specifies the source of the version number. In this case, the version number is obtained from the tag (instead of commit)commit_version_number
: This parameter is required whenversion_source = "tag"
. It specifies whether the version number should be committed to the repository or not. In this case, it is set to true, which means that version number will be committed.tag_commit
: Specifies whether a new tag should be created for the release commit. In this case, it is set to true, which means that a new tag will be created.upload_to_pypi
: Specifies whether the package should be uploaded to the PyPI package repository. In this case, it is set to false, which means that the package will not be uploaded to PyPI.upload_to_release
: Specifies whether the package should be uploaded to the GitHub release page. In this case, it is set to false, which means that the package will not be uploaded to GitHub releases.hvcs
: Specifies the hosting version control system of the project. In this case, it is set to “github”, which means that the project is hosted on GitHub. “gitlab” is also supported.
We can update the files where we have defined the version of the project/module. For this we use the variable version_variable
for normal files and version_toml
for .toml files. The version_source
defines the source of truth for the version. Because the version in these two files is tightly coupled with the git annotated tags, for example we create a git tag with every release automatically (flag tag_commit
is set to true), we can use the source tag
instead of the default value commit
that looks for the last version in the commit messages. To be able to update the files and commit the changes, we need to set the commit_version_number
flag to true. Because we don’t want to upload anything to the Python index PyPi, the flag upload_to_pypi
is set to false. And for now we don’t want to upload anything to our releases. The hvcs
is set to github
(default), other values can be: gitlab
.
We can test this locally by running a few commands, that I will add directly to our Makefile:
# Makefile...
##@ Releases
current-version: ## returns the current version
@semantic-release print-version --current
next-version: ## returns the next version
@semantic-release print-version --next
current-changelog: ## returns the current changelog
@semantic-release changelog --released
next-changelog: ## returns the next changelog
@semantic-release changelog --unreleased
publish-noop: ## publish command (no-operation mode)
@semantic-release publish --noop
With the command current-version we get the version from the last git tag in the git tree:
> make current-version0.1.0
If we add a few commits in conventional commit style, e.g. feat: new cool feature
or fix: nasty bug
, then the command next-version will compute the version bump for that:
> make next-version0.2.0
Right now, we don’t have a CHANGELOG file in our project, so that when we run:
> make current-changelog
the output will be empty. But based on the commits we can create the upcoming changelog with:
> make next-changelog### Feature
* Add releases ([#8](https://github.com/johschmidt42/python-project-johannes/issues/8)) ([`5343f46`](https://github.com/johschmidt42/python-project-johannes/commit/5343f46d9879cc8af273a315698dd307a4bafb4d))
* Docstrings ([#5](https://github.com/johschmidt42/python-project-johannes/issues/5)) ([`fb2fa04`](https://github.com/johschmidt42/python-project-johannes/commit/fb2fa0446d1614052c133824150354d1f05a52e9))
* Add application in app.py ([`3f07683`](https://github.com/johschmidt42/python-project-johannes/commit/3f07683e787b708c31235c9c5357fb45b4b9f02d))### Documentation
* Add search bar & github url ([#6](https://github.com/johschmidt42/python-project-johannes/issues/6)) ([`3df7c48`](https://github.com/johschmidt42/python-project-johannes/commit/3df7c483eca91f2954e80321a7034ae3edb2074b))
* Add badge pages.yml to README.py ([`b76651c`](https://github.com/johschmidt42/python-project-johannes/commit/b76651c5ecb5ab2571bca1663ffc338febd55b25))
* Add documentation to Makefile ([#3](https://github.com/johschmidt42/python-project-johannes/issues/3)) ([`2294ee1`](https://github.com/johschmidt42/python-project-johannes/commit/2294ee105b238410bcfd7b9530e065e5e0381d7a))
If we push new commits (directly to main or through a PR) we could now publish a new release with:
> semantic-release publish
The publish command will do a sequence of things:
- Update or create the changelog file.
- Run semantic-release version.
- Push changes to git.
- Run build_command and upload the distribution file to your repository.
- Run semantic-release changelog and post to your vcs provider.
- Attach the files created by build_command to GitHub releases.
Every step can be of course configured or deactivated!
Let’s build a CI pipeline with GitHub Actions that runs the publish command of semantic-release with every commit to the main branch.
While the overall structure remains the same as in lint.yml, test.yml or pages.yml, there are a few changes that need to be mentioned. In the step Checkout repository
, we add a new token that is used to checkout the branch. That is because the default value GITHUB_TOKEN
does not have the required permissions to operate on protected branches. Therefore, we must use a secret (GH_TOKEN) that contains a Personal Access Token with permissions. I will show later how the Personal Access Token can be generated. We also define fetch-depth: 0
to fetch all history for all branches and tags.
with:
ref: $ github.head_ref
token: $ secrets.GH_TOKEN
fetch-depth: 0
We install only the dependencies that are required for the semantic-release tool with:
- name: Install requirements
run: poetry install --only semver
In the last step, we change some git configurations and run the publish command of semantic-release:
- name: Python Semantic Release
env:
GH_TOKEN: $ secrets.GH_TOKEN
run: |
set -o pipefail
# Set git details
git config --global user.name "github-actions"
git config --global user.email "[email protected]"
# run semantic-release
poetry run semantic-release publish -v DEBUG -D commit_author="github-actions <[email protected]>"
By changing the git config, the user that commits will be “github-actions”. We run the publish command with DEBUG logs (stdout) and set the commit_author
to “github-actions” explicitly. Alternatively to this command, we could use the GitHub action from semantic-release directly, but the set up steps of running the publish command are very few and the action uses a docker container that needs to be pulled every time. Because of that I prefer to make a simple run step instead.
Because the publish command will make a commit, you might be worried that we could end up in an endless loop of workflows being triggered. But do not worry, the resulting commit will not trigger another GitHub Actions Workflow run. This is due to limitations set by GitHub.
Personal access token are an alternative to using passwords for authentication to GitHub Enterprise Server when using the GitHub API or the command line. Personal access tokens are intended to access GitHub resources on behalf of yourself. To access resources on behalf of an organization, or for long-lived integrations, you should use a GitHub App. For more information, see “About apps.”
In other words: We can create an Personal Access Token and have GitHub actions store and use that secret to perform certain operations on our behalf. Keep in mind, if the PAT is compromised, it could be used to perform malicious actions on your GitHub repositories. It is therefore recommended to use GitHub OAuth Apps & GitHub Apps in organisations. For the purposes of this tutorial, we will be using a PAT to allow the GitHub actions pipeline to operate on our behalf.
We can create a new access token by navigating to the Settings
section of your GitHub user and following the instructions summarised in Creating a Personal Access Token. This will give us a window that will look like this:
By selecting the scopes, we define what permissions the token will have. For our use case, we need push access to the repositories which why the new PAT GH_TOKEN
should have the repo
permissions scope. That scope would authorise pushes to protected branches, given you don’t have Include administrators set in the protected branch’s settings.
Going back to the repository overview, in the Settings menu, we can either add an environment setting or a repository setting under the Secrets section:
Repository secrets are specific to a single repository (and all environments used in there), while environment secrets are specific to an environment. The GitHub runner can be configured to run in a specific environment which allows it to access the environment’s secrets. This makes sense when thinking of different stages (e.g. DEV vs PROD) but for this tutorial I’m fine with a repository secret.
Now that we a have a few pipelines (linting, testing, releasing, documentation), we should think about the flow of actions with a commit to main! There are a few things we should be aware of, some of them specific to GitHub.
Ideally, we want that a commit to main creates a push event that trigger the Testing and the Linting workflow. If these are successful, we run the release workflow which is responsible to detect if there should be a version bump based on conventional commits. If so, the release workflow will directly push to main, bumping the versions, adding a git tag and create a release. A published release should then, for example, update the documentation by running the documentation workflow.
Problems & considerations
- If you read the last paragraph carefully or looked at the FlowChart above, you might have noticed that there are two commits to main. One initial (i.e. from a PR) and a second one for the release. Because our lint.yml and test.yml react on push events on the main branch, they would run twice! We should avoid running it twice to save resources. To achieve this, we can add the
[skip ci]
string to our version commit message. A custom commit message can be defined in the pyproject.toml file for the tool semantic_release.
# pyproject.toml...
[tool.semantic_release]
...
commit_message = "version [skip ci]" # skip triggering ci pipelines for version commits
...
2. The workflow pages.yml currently runs on a push event to main. Updating the documentation could be something that we only want to do if there is a new release (We might be referencing the version in the documentation). We can change the trigger in the pages.yml file accordingly:
# pages.ymlname: Documentation
on:
release:
types: [published]
Building the documentation will now require a published release.
3. The Release workflow should depend on the success of the Linting & Testing workflow. Currently we don’t have defined dependencies in our workflow files. We could have these workflows depend on the completion of defined workflow runs in a specific branch with the workflow_run
event. However, if we specify multiple workflows
for the workflow_run
event:
on:
workflow_run:
workflows: [Testing, Linting]
types:
- completed
branches:
- main
only one of the workflows needs to completed! This is not what we want. We expect that all workflows must be completed (and successful). Only then the release workflow should run. This is in contrast to what we get when we define dependencies between jobs in a single workflow. Read more about this inconsistency and shortcoming here.
As an alternative, we could use a sequential execution of pipelines:
The big downside with this idea is that it a) does not allow parallel execution and b) we won’t be able to see the dependency graph in GitHub.
Solution
Currently, the only way I see to deal with the above mentioned problems is to orchestrate the workflows in an orchestrator workflow.
Let’s create this workflow file:
The orchestrator is triggered when we push to the branch main
.
Only if both workflows: Testing & Linting are successful, the release workflow is called. This is defined in with the needs
keyword. If we want to have more granular control over job executions (workflows), consider using the if
keyword as well. But be aware of the confusing behaviour as explained in this article.
To make our workflows lint.yml
, test.yml
& release.yml
callable by another workflow, we need to update the triggers:
# lint.yml---
name: Linting
on:
pull_request:
branches:
- main
workflow_call:
jobs:
...
# test.yml---
name: Testing
on:
pull_request:
branches:
- main
workflow_call:
jobs:
...
# release.yml---
name: Release
on:
workflow_call:
jobs:
...
Now the new workflow (Release) should only run if the workflows for quality checking, in this case the linting and testing, succeed.
To create a badge, this time, I will use the platform shields.io.
It’s a website that generates badges for projects, which display information such as version, build status, and code coverage. It offers a wide range of templates and allows customization of appearance and creation of custom badges. The badges are updated automatically, providing real-time information about the project.
For a release badge, I selected GitHub release (latest SemVer)
:
The badge markdown can be copied and added to the README.md:
Our landing page of the GitHub now looks like this ❤ (I’ve cleaned up a little and provided a description):
[ad_2]
Source link