Why python needs virtual environments


behaviour package management lnaguage design

I was writing a python package manager to understand the issue. This lead to 3 core insights, which I now think are the roots of problems we have:

  1. Broken storage: Only one version of a package per environment
  2. Lack of feedback: No mechanism for python to check and tell mismatches of required package versions and available package versions at runtime
  3. Use of external files to manage package version being installed and used, that lead to an explosion of third party tools that manage isolated environments and sometimes package compatibility

The only solutions I could think of involve tightly coupling the language with the package manager.

I understand that separation of concerns is a thing, and that language package management must be separate. But, does the dev experience really care about what's under the hood? I don't think so.

Inability to store multiple package versions into a single environment

Consider a package flask. Assume that flask 1 and 2 are incompatible. So, scripts that use flask1 cannot use flask 2 without upgrading.

Now, Suppose you installed flask 1 and wrote a script old.py that runs on flask 1

# old.py
import flask
...

From where does python get flask? From the environment (in this case, from the global environment)

Later on, you wrote a script new.py that uses flask 2

# new.py
import flask
...

But after installing flask 2, you will see that flask 1 is gone.

If you look into the global environment, you'll see that pip stores packages like this:

packages/
└── flask/
    ├── flask_file1
    ├── flask_file2
    └── flask_file3

ONLY ONE VERSION OF FLASK CAN BE STORED AT A TIME. There is no mechanism to store multiple versions of flask.

After installing flask2, new.py runs fine because it gets flask2. Howver, old.py has stopped working. Why? because it needs flask1, which is now missing.

If you re-install flask1, it will replace flask2 in your packages directory. So at any point in time, either old.py or new.py can run. Both can never run at the same time.

How do we work around this problem?

Instead of using a global shared environment, we create a dedicated, isolated environment for every project. A virtual environment.

  • old.py uses a virtual environment old-env that contains flask 1
  • new.py uses a virtual environment new-env that contains flask 2

We use some package & env managers to get this done. But the fact that we have 13 major python package managers means that we still don't have a definitive solution.

Python's inability to warn us

Again, currently:

  • old.py uses an isolated environment old-env that contains flask 1
  • new.py uses an isolated environment new-env that contains flask 2

So, while running old.py, you have to activate old-env and then run the script. While running new.py, you have to activate new-env and then run the script.

You either need discipline of activating the correct environment for the correct script (which needs proper organization of environments), or you need to automate it.

Because when python runs old.py, it doesn't check if the environment is providing flask1 or flask2. It will just run the script. You'll only know about your error when your script crashes or doesn't give intended output (assuming you have startup checkups in place, which people normally don't).

Why doesn't python warn us? Because python has no awareness of what package version the script needs

We are importing packages like this:

# old.py
import flask
...

There is no mechanism to import packages like this:

# old.py
import flask==1
...

How do we work around this problem?

Python isn't going to tell us anything. So we have to make sure that we avoid making mistakes.

Currently we do it by using external files to co-ordinate package version in the environment. In this case, the file is the seed that installs packages in to the virtual environment.

Eg: using requirements.txt or pyproject.toml to make sure the env has the correct package version

# requirements.txt
flask==1

And exclusively installing dependencies through requirements.txt

pip install -r requirements.txt

And automating environment loading when opening a project in ide, or when executing a project on the server. Using things like automatic env loading in pycharm ide, or using bash scripts (that setup and load venv) to start the project on servers.

We got around the problem through discipline and automation. Anything that breaks without discipline is just bad design.

Another glaring problem: package incompatibility checks

Since python cannot check for incompatibilities in scripts' requirements and available package versions in the environment, it cannot tell when we load 2 packages that are not compatible.

Eg: we have installed flask1, cli1, arrow2. flask1 needs cli1, but arrow2 needs cli2. Python won't catch this incompatibility that flask1 and arrow2 cannot run at the same time.

Who catches this incompatibility? The package managers does.

How is this incompatibility detected? Every package declares the version of dependencies it needs.

Eg:

  • flask1 declares that it works only with cli 1
    • this range has to be given by the developers of flask after rigorous testing
  • arrow2 declares that it works only with cli 2 and above
    • this range has to be given by the developers of arrow after rigorous testing

The package maintainer is responsible for giving these ranges, and the package manager is responsible for checking compatibility of packages declared in requirements.txt.

However, look at the packages in your production projects. I bet there are too many packages that have given bullshit dependency version ranges, or have never updated dependency version ranges due to overburdened package maintainers who also have a job to do to make money.

Look at the random package you installed from pypi. Maybe it doesn't even specify its dependency version ranges.

It is a proper shitshow. The python ecosystem works on trust (blind trust?), hope and duct tape.

Solutions

Possible solutions:

  • Make the problems visible. Move the package version awareness as far up as possible. In this case, give python itself awareness of required package versions (additions to syntax)
    • This causes great feedbacks
      • Once this is possible, tools like static analyzers will be able to pick up discrepancies in the script and the provided environment (in this case, the dev environment).
      • If the incompatibility still slides in, python will throw up the package incompatibilities warnings at runtime, right when a package is imported.
    • This will make purists angry, because in python's case, the language and its package manager have been kept isolated. Python's package manager is a shitshow. But it probably is a shitshow because of conscious decisions that I'm obviously not aware of.
  • allow pip to check the version of currently installed packages (pip has this functionality right now)

Ending notes

Ask around and you'll find plenty of people who moved away from python be cause of the packaging system issues. You'll also find plenty of people who stopped commercial python development because of the incompatibility shitshows in companies' codebases.

This IS a problem. A proper big problem. Acknowledgement goes a long way.

I love python enough to use it, despite hating the mechanics of the ecosystem around it. It is just that good. If you need some convincing, use something like c++, java for project euler, then try using python. Python inherently seems to think about user comfort and speed to solution. I don't know how. Probably due to the inherent philosophy of its community.

I'm not into language design enough, and care enough to try solving this. (Also, am most probably not competent enough).

But it doesn't take much smartness to see the problem and its causes.

TABLE OF CONTENTS