Sunday, September 25, 2022
HomeArtificial IntelligenceA First Course on Deploying Python Projects

A First Course on Deploying Python Projects

[ad_1]

Last Updated on May 4, 2022

After all the hard work developing a project in Python, we want to share our project with other people. It can be your friends or your colleagues. Maybe they are not interested in your code, but they want to run it and make some real use of it. For example, you create a regression model that can predict a value based on input features. Your friend wants to provide their own feature and see what value your model predicts. But as your Python project gets larger, it is not as simple as sending your friend a small script. There can be many supporting files, multiple scripts, and also dependencies on a list of libraries. Getting all these right can be a challenge.

After finishing this tutorial, you will learn:

  • How to harden your code for easier deployment by making it a module
  • How to create a package for your module so we can rely on pip to manage the dependencies
  • How to use a venv module to create reproducible running environments

Let’s get started!

A First Course on Deploying Python Projects
Photo by Kelly L. Some rights reserved.

Overview

This tutorial is divided into four parts; they are:

  • From development to deployment
  • Creating modules
  • From module to package
  • Using venv for your project

From Development to Deployment

When we finish a project in Python, occasionally, we don’t want to shelve it but want to make it a routine job. We may finish training a machine learning model and actively use the trained model for prediction. We may build a time series model and use it for next-step prediction. However, new data comes in every day, so we need to re-train it to adapt to the development and keep future predictions accurate.

Whatever the reason, we need to make sure the program will run as expected. However, this can be harder than we thought. A simple Python script may not be a difficult issue, but as our program gets larger with more dependencies, many things can go wrong. For example, a newer version of a library that we used can break the workflow. Or our Python script might run some external program, and that may cease to work after an upgrade of our OS. Another case is when the program depends on some files located at a specific path, but we may accidentally delete or rename a file.

There is always a way for our program to fail to execute. But we have some techniques to make it more robust and more reliable.

Creating Modules

In a previous post, we demonstrated that we could check a code snippet’s time to finish with the following command:

At the same time, we can also use it as part of a script and do the following:

The import statement in Python allows you to reuse functions defined in another file by considering it as a module. You may wonder how we can make a module not only provide functions but also become an executable program. This is the first step to help deploy our code. If we can make our module executable, the users would not need to understand how our code is structured to use it.

If our program is large enough to have multiple files, it is better to package it as a module. A module in Python is usually a folder of Python scripts with a clear entry point. Hence it is more convenient to send to other people and easier to understand the flow. Moreover, we can add versions to the module and let pip keep track of the version installed.

A simple, single file program can be written as follows:

If we save this as randomsample.py in the local directory, we can either run it with:

or:

And we can reuse the functions in another script with:

This works because the magic variable __name__ will be "__main__" only if the script is run as the main program but not when imported from another script. With this, your machine learning project can probably be packaged as the following:

Now, regressor is a directory with those five files in it. And __init__.py is an empty file, just to signal that this directory is a Python module that you can import. The script train.py is as follows:

The script for predict.py is:

Then, we can run the following under the parent directory of regressor/ to load the data and train a linear regression model. Then we can save the model with pickle:

If we move this pickle file into the regressor/ directory, we can also do the following in a command line to run the model:

Here the numerical arguments are a vector of input features to the model. If we further move out the if block, namely, create a file regressor/__main__.py with the following code:

Then we can run the model directly from the module:

Note the line form .predict import predict in the example above uses Python’s relative import syntax. This should be used inside a module to import components from other scripts of the same module.

From Module to Package

If you want to distribute your Python project as a final product, it is convenient to be able to install your project as a package with the pip install command. This can be done easily. As you already created a module from your project, what you need to supplement is some simple setup instructions. Now you need to create a project directory and put your module in it with a pyproject.toml file, a setup.cfg file, and a MANIFEST.in file. The file structure would be like this:

We will use setuptools as it has become a standard for this task. The file pyproject.toml is to specify setuptools:

The key information is provided in setup.cfg. We need to specify the name of the module, the version, some optional description, what to include, and what to depend on, such as the following:

The MANIFEST.in is just to specify what extra file we need to include. In projects that do not have a non-Python script included, this file can be omitted. But in our case, we need to include the trained model and the data file:

Then in the project directory, we can install it as a module into our Python system with the following command:

Afterward, the following code works anywhere as regressor is a module accessible in our Python installation:

There are a few details worth explaining in the setup.cfg: The metadata section is for the pip system. Hence we named our package mlm_demo, which you can see in the output of the pip list command. However, Python’s module system will recognize the module name as regressor as specified in the options section. Therefore, this is the name you should use in the import statement. Often, these two names are the same for the convenience of the users, and that’s why people use the names “package” and “module” interchangeably. Similarly, version 0.0.1 appears in pip but is not known from the code. It is a convention to put this in __init__.py in the module directory, so you can check the version in another script that uses it:

The install_requires part in the options section is the key to making our project run. It means that when we install this module, we also need to install those other modules at those versions (if specified). This may create a tree of dependencies, but pip will take care of it when you run the pip install command. As you can expect, we are using Python’s comparison operator == for a specific version. But if we can accept multiple versions, we use a comma (,) to separate the conditions, such as in the case of numpy above.

Now you can ship the entire project directory to other people (e.g., in a ZIP file). They can install it with pip install in the project directory and then run your code with python -m regressor given the appropriate command line argument provided.

A final note: Perhaps you heard of the requirements.txt file in a Python project. It is just a text file, usually placed in a directory with a Python module or some Python scripts. It has a format similar to the dependency specification mentioned above. For example, it may look like this:

What is aimed for is that you do not want to make your project into a package but still want to give hints on the libraries and their versions that your project expects. This file can be understood by pip, and we can make it set up our system to prepare for the project:

But this is just for a project in development, and that’s all the convenience the requirements.txt can provide.

Using venv for Your Project

The above is probably the most efficient way to ship and deploy a project since you include only the most essential files. This is also the recommended way because it is platform-agnostic. This still works if we change our Python version or move to a different OS (unless some specific dependency forbids us).

But there are cases where we may want to reproduce an exact environment for our project to run. For example, instead of requiring some packages installed, we want some that must not be installed. Also, there are cases where after we installed a package with pip, the version dependency breaks after another package is installed. We can solve this problem with the venv module in Python.

The venv module is from Python’s standard library to allow us to create a virtual environment. It is not a virtual machine or virtualization like Docker can provide; instead, it heavily modifies the path location that Python operates. For example, we can install multiple versions of Python in our OS, but a virtual environment always assumes the python command means a particular version. Another example is that within one virtual environment, we can run pip install to set up some packages in a virtual environment directory that will not interfere with the system outside.

To start with venv, we can simply find a good location and run the command:

Then there will be a directory named myproject created. A virtual environment is supposed to operate in a shell (so the environment variables can be manipulated). To activate a virtual environment, we execute the activation shell script with the following command (e.g., under bash or zsh in Linux and macOS):

And afterward, you’re under the Python virtual environment. The command python will be the command you created in the virtual environment (in case you have multiple Python versions installed in your OS). And the packages installed will be located under myproject/lib/python3.9/site-packages (assuming Python 3.9). When you run pip install or pip list, you only see the packages under the virtual environment.

To leave the virtual environment, we run deactivate in the shell command line:

This is defined as a shell function.

Using virtual environments could be particularly useful if you have multiple projects in development and they require different versions of packages (such as different versions of TensorFlow). You can simply create a virtual environment, activate it, install the correct versions of all the libraries you need using the pip install command, then put your project code inside the virtual environment. Your virtual environment directory can be huge in size (e.g., just installing TensorFlow with its dependencies will consume almost 1GB of disk space). But afterward, shipping the entire virtual environment directory to others can guarantee the exact environment to execute your code. This can be an alternative to the Docker container if you prefer not to run the Docker server.

Further Reading

Indeed, some other tools exist that help us deploy our projects neatly. Docker mentioned above can be one. The zipapp package from Python’s standard library is also an interesting tool. Below are resources on the topic if you are looking to go deeper.

Articles

APIs and software

Summary

In this tutorial, you’ve seen how we can confidently wrap up our project and deliver it to another user to run it. Specifically, you learned:

  • The minimal change to a folder of Python scripts to make them a module
  • How to convert a module into a package for pip
  • What is a virtual environment in Python, and how to use it

[ad_2]

Source link

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments