Policy / Recommendations for optional dependencies?

I’ve noticed a number of conda-forge packages list run-time dependencies that are not required to use the package.

I expect that’s because they are required to run the tutorials, demos, maybe some of the tests, etc.

I understand the desire for your users to be able to conda install my_package and immediate run the demos / tutorials, but that means you can get some pretty heavyweight dependencies, like Jupyter, matplotlib, which can then bring in PyQt, and so on.

Anyway – I like to keep my environments lean – so conda packages should have minimal dependencies. But before I start writting PRs, I’d like to know what the community thinks is best practice.

Conda’s inability to specify optional dependencies (other than creating separate packages) is a giant wart IMHO.

We have been “batteries include” for a while now. Some recipes have multiple outputs that install extra stuff. However, if we remove an optional dependence from a given recipe (say dependency foo for package blah), the kinds of changes we can make that are not breaking are limited. We’ll need to make a blah-base without foo and then make a metapackage called blah to installs blah-base and foo. Otherwise, we’ll be breaking people’s environments. You’ll have to install blah-base yourself to get the leaner environment.

@dhirschfeld If we are going to propose specs, they are best directed towards conda-incubator/ceps.

I tend to agree here – but for now, I’d like to discuss what best practices are with the current technology.

Is “We” in this case conda-forge? Is that written down somewhere?

Even if so, what “batteries” should be included – I see two distinct kinds of extra dependencies:

  1. optional features: e.g. dask.distributed – ou can use dask without it, but there are some features you can’t use. I can certainly see an argument for including this kind of thing.

  2. Other packages used for demos, tutorials, etc – e.g matplotlib or Jupyter for a package that is not about plotting – of course those are usual for demoing, etc, but not required for the core functionality.

Personally, it’s (2) that I object to – that tends to being in a sometimes a very large dependency stack that it utterly irrelevant to actually using the package. The assumption that everyone that wants some computation feature is doing interactive data analysis with the full SciPy stack is just plain wrong.

– and if this package is not at the top of the requirements stack, it’s VERY hard to avoid bringing that stuff in – you’d essentially have to turn off conda’s dependency resolution. And if the extra requirments are pinned at all, that could really create a challenge.

NOTE: MPL eventually addressed some of this within its stack by creating a matplotlib-base package, so you wouldn’t have to bring in all the back-ends if you didn’t want to. I think that should be recommended policy.

Actually, I think recommended policy should be for the “standard” package to be minimal, and either:

-Document that if you wan to run the demos you are going to need, these packages. You can provide a demos_requirements.txt file if you want.

  • Offer another package that provided every likley related dependency.
    • @dhirschfeld is right – this shoudln’t be necceasy, but it is right now.

Honestly, I thikn folks are including everything not because they really think it’s best, but because they haven’t thought carefully about it. In particular, the full stack is what developers need to work on the documentation and demos, etc – why wouldn’t they include them?

Anyway, at the end of the day, I’ll have to live with what the conda-forge communities’ consensus on this is, but I thikn that shold be documented.

In some of my work, I have three (or four!) sets of requirments:

requirements_run
requirements_develop
requirements_test
requirements_docs

(I could probably merge dev and docs without anyone complaining, but …)

It can be a bit annoying, but it does let us build lean environments.

We do not have a written policy but I has been discussed amongst the core group for years now.

A written policy is hard in this case since things can be very subtle.

Well, a “policy” is tricky, but recommendations / guidelines would be very helpful.

With a touch more detail than “batteries included” :slight_smile:

1 Like

On several packages I’m aware of (and co-maintain), optional outputs get mapped to separate outputs, e.g. pip install foo[extra] has an equivalent conda install foo-extra.

I’d really love to have better syntax for (pip-)optional dependencies, but that would be quite a large amount of work.

Exactly – what I’m suggesting is that that be “standard practice” on conda-forge.

However, the cases I’ve seen are actually NOT set up as optional features with pip either – but PyPi is not curated, folks can do whatever they want.

Also: in the case of [extras] are they ONLY handy dependencies? Or is the actual additional functionality of the package? The cases that bother me the most is dependencies (particularly large ones) that are in no way required for the full functionality of the package – only for demos, tutorials etc. That’s what I’d really like to see left out.

If a package maintainer really want to be able to give their users a single thing to install to be able to run all their examples, etc, I’d like to see recommended practice to make another meta-package: foo-demos that would install foo an all the other stuff that might be useful with it.

TL;DR: packages should not include “things that are handy to use along with that package” as a dependency.

I’ve been working on a related issue in a repo that is not yet up-to-snuff to be published on conda-forge (but will be soon). See the new_conda_env project here.

I used to install jupyterlab in all my environments until I found out that having jlab in base and ipykernel in each env was enough.