Defining and documenting how Pip should interact with Conda environments

rgommers · February 16, 2023, 3:44pm

Context

There is a long history of discussions around, and issues with, the use of Pip in Conda environments. There are both legitimate needs for using Pip in a Conda environment and easy to make mistakes by users which break environments or applications when they are using pip when then should have used conda or mamba.

Over the past few months there have been a number of threads on the Python packaging Discourse where Conda came up. For example:

PEP 704 - Require virtual environments by default for package installers, which proposes a change which would be a significant backwards compat break in pip for conda users. Discourse thread
The Wanting a singular packaging tool/vision thread
The Python Packaging Strategy Discussion - Part 1 thread
PEP 668 - Marking Python base environments as “externally managed” (a bit older). Discourse thread
This post specifically outlines the existing “knobs” available for tweaking how pip interacts with other package managers.

Rather than always pushing back with “this doesn’t work for conda”, Steve Dower suggested that the conda community should specify how it wants pip to behave inside conda envs (see https://discuss.python.org/t/wanting-a-singular-packaging-tool-vision/21141/149). That is not an easy question, but it would be great to see a coherent view indeed. This may, but doesn’t have to, include a request for new features from pip or from Python packaging.

I’ll note that the question is broader than pip - there are other Python package installers, and there are also topics like virtual environments and dependency specifiers that interact with conda/mamba or conda environments.

Usage scenarios

There are multiple reasons for users or tools wanting to use pip or wheels from PyPI inside a conda environments:

Installing Python packages from PyPI for which no conda package is available in defaults, conda-forge or another channel,
Installing a package from source locally. This can be a development version of any package, even if it’s available as a release in a conda channel,
Using an all-wheel workflow like conda create -n my-devenv python=3.11 (which does install pip in addition to python 3.11) followed by pip install a-bunch-of-pypi-pkgs,
As part of the workflow for building conda packages:
- recipe/build.sh in a conda-forge feedstock is likely to contain something like ${PYTHON} -m pip install . -vv, as documented in Contributing packages — conda-forge 2023.02.09 documentation,
- Another example: Contributing packages — conda-forge 2023.02.09 documentation
- pip list and pip check are used too: Guidelines — conda-forge 2023.02.09 documentation
- Note that the conda-forge automation ensures --no-build-isolation is used by default when using pip within a conda-forge build, in order to avoid problems when building packages with compiled extensions,

There is some basic guidance in the Conda docs for this:

Managing packages — conda 23.1.0 documentation
Managing environments — conda 23.1.0 documentation
The docs also contain an example of using pip to install a package from PyPI as part of an environment.yml file:

name: stats2
channels:
  - javascript
dependencies:
  - python=3.9
  - bokeh=2.4.2
  - conda-forge::numpy=1.21.*
  - nodejs=16.13.*
  - flask
  - pip
  - pip:
    - Flask-Testing

There is a pip_interop_enabled Improving interoperability with pip — conda 23.1.0.post38+278795193 documentation experimental mode for conda. Unlike most conda features, this doesn’t seem to have an analog in mamba (it’s possible that mamba already has the improved behavior hidden behind this flag, not sure).

On the other hand, pip is often misused (especially by beginning users) when conda or mamba should be used instead. This is a very frequent source of broken environments and of bug reports to popular Python packages. The Spyder team even made a polished 3-minute video about this with the usual “avoid mixing pip and conda” advice. Even advanced Python users who are comfortable building from source tend to shoot themselves in the foot because they use pip install . instead of pip install . --no-build-isolation.

Another thing that leads to unpredictable results is iterative use of conda install and pip install in an environment - this tends to degrade and then break. If pip usage is indeed warranted, it should be done once - any further updates require recreating the environment for reliable results (as documented under “Recreate the environment if changes are needed” in this section of the conda docs).

Conceptual issues with usage scenarios

One fairly fundamental problem here is that pip is used for multiple purposes:

Building a package from source, then installing it,
Installing wheels from PyPI
Package/dependency management
Virtual environment creation/management: now for isolated builds, possibly more in the future.

conda on the other hand has a single coherent purpose (installing a compatible set of built packages into an environment) but does not have a good “build from source” (or “developer”) story. The metadata in a Python package’s source tree (in pyproject.toml) is not generic dependency metadata, it’s PyPI-specific. The mapping from PyPI dependencies to conda package dependencies happens in conda recipes, rather than in the project’s own VCS repo. Hence the need to use pip, rather than a similar conda install ..

So what conda users need from pip is a subset of everything that pip does, and it comes with a UX that isn’t ideal.

Some concrete questions to answer

Should it be possible to use pip (and other installers like pypa/installer) to install a Python package into a conda environment from a vcs checkout, an sdist, or a wheel?
- answer RG: yes
- Also for the base environment?
  - answer RG: no, too fragile
Should pip be able to uninstall/overwrite conda packages, and vice versa?
- answer RG: probably yes, users rely on this and there are valid usage scenarios, so making this work as smoothly as possible seems preferred over preventing this.
Should a conda env be considered or marked as a virtual environment?
- answer RG: no, they’re clearly different beasts, this will lead to problems

Design changes?

Options given the current state of Python packaging and Pip include:

Do/change nothing
Protect the base environment better by adding EXTERNALLY-MANAGED to it (see conda#12245)
Add EXTERNALLY-MANAGED to all environments, which makes users opt in to potential breakage by using a flag: pip install ... --break-system-packages.
- Optionally, provide an alternative installer (maybe even a renamed pip?) for the valid use cases that conda users have, like installing from source or installing packages that are not present in a conda channel.

Options for future design changes that would require Pip or Python packaging changes include:

pip not installing dependencies that are available as conda packages. E.g., if pip install my-niche-package contains a dependency on numpy, then pip should not install numpy (it could either error out with an informative message, or - very futuristic, and probably hard - install numpy with conda/mamba)
Shared metadata, or metadata mapping, such that the pyproject.toml dependencies would be fully translatable to conda packages.
- This would make it easier to do things like conda install . spinning up an isolated conda env with conda build dependencies, rather than a virtual environment with packages installed from wheels,
- It would also make it easier for projects to not have multiple environment.yml, requirements.txt, etc. files (there’s already a PyPI-only issue here, because there is no such thing as pip install . --only-deps and there should be),
- Equivalently, the environment.yaml and requirements.txt formats could converge.
Generalize the concept of virtual environments, so tools that use them can more easily support conda environments as well (think of pip/tox/nox/asv/etc. here; anything that automatically creates a temporary or permanent venv)
?

Other specific requests for `pip` or Python packaging as a whole?

No breakage for scenarios that work today (so “no” to PEP 704’s idea of requiring a virtual environment to install into).
?

Relevant open issues

This is an incomplete list, but touches on some of the more important open issues:

Editable installs (pip install -e .) doesn’t interact well with conda: conda#5861
Add an EXTERNALLY-MANAGED file to base environment: conda#12245
conda env export tends to not be robust when PyPI packages are present: conda#9624
pip_interop_enabled issues: conda#11177, conda#12242,

Next steps

The above is a start at a summary; it’d be great to discuss here as well as in an upcoming conda community meeting (as proposed by @beckermr in the conda Element channel), expand this doc, and then turn it into (a) doc improvements in conda/mamba/conda-forge docs, and (b) in something that can be presented as a “think is what the conda community thinks or would like” to the PyPA & https://discuss.python.org/ crowd.

beckermr · February 17, 2023, 2:24am

Thank you for the summary!

BrenBarn · February 20, 2023, 6:07am

Just my two cents as a conda user. . .

I agree, basically, but. . .

I do think this should be possible, but I think that overwrites should only be allowed with an explicit flag like --break-other-managers.

What are the valid use cases for pip overwriting a conda package? That is, what are the use cases there that cannot be handled by conda-uninstalling the conda package and then pip-installing the PyPI package?

What does it mean for pip to “uninstall” a conda package? Is that actually possible? As I see it, one of the pain points is actually that there isn’t any way for different package managers to tell each other “hey uninstall this so I can install it myself”. If there were, that might alleviate some of the problems that come from “layering” pip/conda installs.

My general feeling is that no package manager (not conda, not pip, not anything) should install a package unless one of the following holds:

the package is not installed
the package is installed and the installed version is managed by the same package manager (i.e., a manager is allowed to overwrite things it installed before)
a --dangerous-breakage-type option is activated

I think the bigger question for me is whether virtual environments of the venv type should even be considered a good idea going forward. I’m not sure exactly what the consequences are of “marking an env as a virtual environment”; is it just that under the proposed PEP 704, pip would then be allowed to install into it?

rgommers · February 20, 2023, 9:11am

A simple example: say you have a dev environment with some part of the PyData stack, and now you want to test a development version of numpy, or your own patch to numpy. Currently, you simply check out the commit of numpy you want and type pip install . --no-build-isolation. If we’d require conda uninstall numpy first, that would uninstall both numpy and everything that depends on it. So this is now effectively unusable.

pip only knows about Python packages, so if you install pip uninstall pkgname then it will install all the files listed by package metadata as being installed (from the RECORD file), independent of whether those files were put there by conda, pip or any other installer. So for Python packages uninstall will likely be complete - however conda of course is able to install other files, and those may be left behind. Which is not a big issue in practice.

You can also look at this the opposite way: what was the point of standardizing RECORD as a key part of the wheel format if you’re not allowed to use it?

I have my opinions, but that is well out of scope for this thread. Let’s just leave it at “conda envs clearly are not virtualenvs”.

ChrisBarker-NOAA · February 23, 2023, 1:51am

LT;DR

In order for pip to work well with conda (and other package managers), the parts that build / install packages should not assume that users are using a pip-only environment.

The long version:

All these details should be discussed, of course, but I think it would be helpful if the general concepts/goals/[better word?] behind conda (and pip) are clearly laid out.

Some key points (misunderstandings):

conda is NOT a Python package manager – it a general package manager. I would certainly hope that the core PyPi folks clearly understand that, but newbies sure don’t, and many package authors do not either. This is understandably confusing, because:
- conda is written in Python (though mamba isn’t)
- conda was developed originally for Python (and associated code)
- conda is widely used by the Python community (and maybe not at all by non-Python folks??) and the conda community is closely linked to the Python community.
- conda has a few Python-specific features.
However, it’s really key to keep in mind.
conda is not “for data science” – it is a general purpose tool that can be useful outside of data science / scientific software development, etc.
- Yes, it exists because Data Science folks had problems that the existing tools didn’t solve, but it does address issues that have nothing to do with data science as well.
- Example: In my shop, we develop a half a dozen or so web applications – they all use conda for CI and deployment. Yes, we started that because some of them make heavy use of the scipy stack, but it’s been helpful as well for ones that don’t – e.g. you can install redis with conda.
“conda” is a (Open Source) technology. “Anaconda” is a Data Science distribution. “Anaconda.org” is a web service for hosting packages. “Anaconda.com” is a company that works on all of these (and runs Anaconda.org, I think). “conda-forge” is a community-driven (developed and supported) system for managing and building conda packages (which are served by Anaconda.org).
- What’s notable here is that all these pieces work together, but they are not parts of the same thing, nor as tightly integrated as, e.g. pip and PyPi, and even cPython are.

What all this means

I think a key point is that ideally, conda shouldn’t have to do ANYTHING special / different with/for pip. Practicality beats purity, so conda does have some extra hooks in there to support the interaction, but that’s really not part of its Platonic Ideal

And I don’t think pip (or the PyPA) should have to do anything special to accommodate conda, either. However, what would be really nice is if it didn’t do anything to make it hard for conda. But as conda is a general purpose tool, that means:

pip, PyPi, package maintainers, etc should not assume that everyone is using pip / virtualenv, etc to manage their work.

If this is done, then pip will work better with conda, and with spack, and apt-get and yum, and …

This is a bit tricky because pip is quite a few things in one (at least fewer than setuptools was) It’s a package manager, it’s a build front-end, etc. And it’s also pip’s mission (in my mind anyway) to make things easy and obvious for its core users and newbies.

MIcro case study:

Because it’s on my mind, here’s a simple example. I recently tried to conda-install pytype. It’s there on conda-forge, so no problem at all. Except it didn’t work (so much for the curation )

I poked into it, posted some issues, got some things wrong, madea. PR, and in the end it was all fixed (I think, waiting for the next release) – but the problem:

pytype requires ninja, which is not a Python library, nor does it have Python bindings…
This is a perfect job for conda – there’s a ninja conda package – done!

but …

The pytype developers wanted their users to be able to simply “pip install pytype” and it have it work – of course they did.
I have no idea who made it, but there is a PyPi package: ninja that provides a way to install ninja. but well, pip isn’t designed to install arbitrary executables – so the PyPi ninja package provides ninja wrapped up in a python package with an entry point, so you can do:
python -m ninja and get the ninja utility. – nifty.
So pytype depends on the PyPi ninja package.
And the pytype code was patched to “work with virtual envs” (note on the commit) – so it would call python -m ninja – now pytype works out of the box after pip install pytype – great!
However, the conda-forge pytype package depends on the conda-forge ninja package, which installs a ninja command – but not a ninja python module – so you get a module not found error when you try to use pytype.

The solution:

A couple of issues here the collided:

The conda build recipe only tested that the pytype command could be run: pytype --help not that it actually worked. So a broken package seemed fine.
- we added a test that actually ran pytype on a little file, to make sure it actually worked.
The pytype developers were only testing with the PyPi package, so they didn’t notice that their code no longer worked with a ninja installed some other way.
- A PR was made to the pytype repo to have it check for the PyPi package, and, if not there, to try to use a system installed version. That PR has been accepted.

NOTE: This was all discovered, diagnosed, and fixed within 48 hours – total open-source success story!

Lessons learned?

I don’t think there’s anything with the tools (pip, conda, conda-forge, PyPi) that could or should be changed – this is a cultural issue: if developers think PyPi and pip are the be-all and end-all, then their stuff may not work outside of that environment.

But I do think a lesson here is that the tools should try to avoid making this kind of confusion even more likely – and the documentation maybe should make some of these points clear.

Also – maybe the defaults shouldn’t make things easier for pip-only users, at the expense of making them harder for other package systems … cough PEP 704 cough

rgommers · March 1, 2023, 10:25pm

This topic was discussed today in the Conda community meeting (thanks for the suggestion @beckermr!). A few points to summarize and follow up on:

There was agreement that this is an important, and hairy, topic. And that there’s work to do here:

in documenting recommended workflows for various tasks when working with Conda environments,
in documenting what conda, conda-build & co themselves use from pip and other Python packaging tools
and even a significant amount of user/UX research. Active participants in the Conda and conda-forge community can sketch part of the picture, but there’s a lot of different types of users and those may have needs (and therefore workflows) that are quite different from that of most contributors.

To follow up on my initial post, it’d be useful to expand it, for folks active here to post other workflows where they use pip to make the picture as complete as we can in the short term, and then do some review so what’s written here seems at least reasonable and best practice. That is certainly not going to be complete - but a partially complete and roughly accurate description is a lot more than what we had till now, and it should be helpful for Python packaging folks when discussing packaging-related strategy and PEPs. Hopefully that user/UX research project mentioned above can then follow up with a more complete picture (and improved end user focused docs!) in a reasonable time frame (very likely >6 months though).